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DISCRIMINATION OF COMPLEX STIMULI: THE 
RELATIONSHIP OF TRAINING AND TEST 
STIMULI IN TRANSFER OF 
DISCRIMINATION !? - 


KENNETH H. KURTZ 


Yale University 


It has been suggested by Gibson (2) 
that a set of stimuli which have been 
differentiated in one learning problem 
will tend in subsequent problems to 
generalize (a) less among themselves 
and () less with new stimulus items. 
Positive transfer of discrimination 
training from one learning task to 
another task employing the same set 
of stimuli has been clearly demon- 
strated (1, 7), but the problem of 
transfer when new stimuli are em- 
ployed has received little attention. 
In the present study an attempt is 
made to demonstrate that the same 
training procedures may result in 
either positive or negative transfer to 
novel stimuli, depending upon the 
relationships between the stimuli em- 
ployed in the two tasks. 


1 The present article reports an extension of 
part of a dissertation presented for the degree of 
Doctor of Philosophy in Yale University. 

2 The writer wishes to express his indebtedness 
to Drs. C. I. Hovland, F. D. Sheffield, and R. P. 
Abelson, members of his thesis committee, for 
invaluable advice and assistance throughout all 
stages of the present research. He wishes also 
to thank Drs. M. A. May, B. Rosner, and F. A. 
Logan, members of his reading committee, for 
their helpful suggestions in preparing the present 
report. 


The predictions studied are based 
upon an observing-response formu- 
lation of discrimination learning simi- 
lar to formulations advanced by 
Spence (8) and by Miller and Dollard 
(6). According to this formulation, 
an individual may learn to react to 
similar stimuli with an observational 
response, or sequence of responses, 
which results in more distinctive 
stimulation from the stimuli to be 
discriminated, and which, like any 
other response, may become selec- 
tively associated with a discriminative 
cue. (In the case of two similar 
stimulus complexes, the discrimina- 
tive cue for an observing response 
will usually be a gross, readily per- 
ceived property common to the two 
complexes, while the cues made dis- 
tinct by the observing response will 
be more subtle, distinguishing prop- 
erties.) By virtue of their stimulus 
connections, observing responses 
learned in one discrimination problem 
will tend to generalize to subsequent 
problems employing stimuli with simi- 
lar gross characteristics, whether or 
not the distinguishing properties are 
the same. 
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An assumption of key importance 
for present purposes is that any given 
observing response will distinguish 
only certain types of stimulus differ- 
ences and not others. (To illustrate, 
counting will serve to distinguish two 
arrays of dots differing in number, 
but not two arrays differing in pattern 
and alike in number.) It follows that 
when an observing response gener- 
alizes from one learning task to an- 
other, positive or negative transfer 
will be obtained according to whether 
the distinguishing characteristics in 
the two tasks are the same or different. 
If the distinguishing characteristics 
are the same, the transferred ob- 
serving response will differentiate the 
stimuli of the second task, with the 
result that the trial-and-error period 
to discover a successful observing 
response will be eliminated and the 
learning time reduced. If, on the 
other hand, the distinguishing char- 
acteristics are different in the two 
tasks, the learning time will be in- 
creased, since the transferred ob- 
serving response must first be ex- 
tinguished and then the same time 
spent in trial and error as if no 
transfer had occurred. 


MetTHOD 


A comparison was made of the discrimina- 
bility of pairs of similar stimuli in a test task 
following four treatments of prior training. The 
four treatments comprised three different famil- 
iarization conditions and a control condition 
without familiarization. The task used for test 
purposes was a paired-associates learning prob- 
lem; that used for purposes of prior familiariza- 
tion consisted of a series of short-delay recogni- 
tion tests. In the first familiarization condition 
the training stimuli were identical to the test 
stimuli; in the second condition the familiar- 
ization and test pairs were not identical, but in 
both pairs the two stimuli differed by the same 
property; in the third condition the test pair 
included one of the original stimuli and a sec- 
ond, novel stimulus which differed from the first 
in a different property than had the corre- 
sponding stimulus of the familiarization pair. 
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Fic. 1. Stimulus sets A, B,C, and D. Com- 
plete array of stimulus figures from which stimuli 
were selected for familiarization and paired- 
associates learning (approximately } actual size). 
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It was predicted that, relative to the control 
condition, positive transfer would be obtained 
under the first two familiarization conditions, 
negative transfer under the third. 


Stimulus Materials 


The stimulus materials employed in the two 
discrimination problems were selected from the 
array of 16 stimuli shown in Fig. 1. The num- 
bers appearing beneath the stimuli in Fig. 1 are 
for purposes of identification in the present report 
and did not appear during the experiment. It 
will readily be seen that the 16 stimuli comprise 
four sets of four similar stimuli each. The four 
stimuli within each set can be classified according 
to two independent properties or dimensions, 
each dimension having two possible values. The 
two dimensions of variation in Set A are position 
of the horizontal line and position of the middle 
bar; in Set B, length of the topmost horizontal 
line and length of the center vertical line; in Set 
C, shape of the eyebrows and shape of the hair; 
and in Set D, tilt of the quadrilateral and shape 
of the lower black portion. Duplicates of these 
stimuli for use in the two discrimination prob- 
lems were prepared by lithographic reproduction. 

Familiarization cards—Materials for famil- 
iarization consisted of 3 X 5 in. index cards on 
each of which were mounted either two different 
stimuli from the same set or two reproductions 
of a single stimulus. The two stimuli or repro- 
ductions of the same stimulus were mounted so 
that one was immediately above the other. 
Using Stimuli 1 and 2 of Set A, a pack of four 
cards was prepared as follows: Card 1 had two 
reproductions of Stimulus 1; Card 2 had two 
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reproductions of Stimulus 2; Card 3 had Stimulus 
1 above Stimulus 2; and Card 4 had Stimulus 2 
above Stimulus 1. By sliding these cards behind 
a screen with appropriately placed apertures, 
first the upper and then the lower stimulus could 
be presented to S’s view. 

Four different packs of four cards each were 
made in the manner described, a different com- 
bination of two stimuli from Set A being utilized 
in the preparation of each pack. One pack in- 
cluded Stimuli 1 and 2; a second pack, Stimuli 
1 and 3; a third pack, Stimuli 3 and 4; and a 
fourth pack, Stimuli 2 and 4, Twelve additional 
packs, comprising four analogous packs from 
each of the remaining three sets of stimuli, were 
prepared. 

Paired-associates lists —Two comparable 
paired-associates lists were utilized. For pur- 
poses of illustration the stimulus and response 
items employed in one of the two lists are pre- 
sented in Fig. 2. It will be noted from Fig. 2 
that the stimulus items of this list consisted of 
Stimuli 1 and 2 from each of the four sets shown 
in Fig. 1 and that the response items consisted 
of eight different color names. The second list 
was identical to the first except that Stimulus 2 
of each set was replaced in List 2 by Stimulus 3 
of the same set. Half of the Ss learned one list; 
half learned the other. The purpose of em- 
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Fic. 2. Stimulus and response items employed 
in paired-associates learning, List 1. 


ploying two lists was to utilize both dimensions 
of each stimulus set. 

The eight pairs of stimulus and response items 
in a given list were arranged on a paper tape for 
presentation in a modified Hull memory drum. 
The tape for each list included four different 
permutations of the eight stimulus-response pairs 
in that list, i.e., a total of 32 stimulus-response 
pairs. A given stimulus item was always paired 
with the same response item throughout all four 
permutations of a given list. The response items 
were typewritten in lower-case letters. The 
tapes for the two lists were identical except that 
wherever Stimulus 2 of any set appeared in the 
tape for List 1, Stimulus 3 of the same set ap- 
peared in the tape for List 2. The four per- 
mutations in each tape were random within the 
following restrictions: (a) no stimulus was im- 
mediately followed by any other stimulus more 
than once in the entire sequence, (b) every oc- 
currence of a stimulus from any given set was 
separated from another occurrence of the same 
stimulus or the occurrence of the other stimulus 
belonging to the same set by the occurrence of at 
least two other stimuli, and (c) two stimuli be- 
longing to the same set were not allowed to occur 
throughout all four permutations in single or 
double alternation. 


Procedure and Design 


The complete number of discriminations em- 
bodied in either of the two paired-associates lists 
may be partitioned into (a) discriminations 
among grossly different stimuli belonging to 
different sets, and (b) discriminations between 
highly similar stimuli belonging to the same set. 
The discriminations among the four sets are 
relatively easy; those within the four sets are 
relatively difficult and are the discriminations of 
primary interest for present purposes. 

For any given S the paired-associates stimuli 
of different sets were subjected to different con- 
ditions of prior familiarization: the two stimuli 
from one set were assigned to one condition, the 
two stimuli of another set to another condition, 
etc. Information concerning all four familiari- 
zation conditions was thereby obtained con- 
currently from each S during paired-associates 
learning. 

Familiarization training—A control con- 
dition was provided by excluding the stimuli of 
one set from the familiarization training of each 
S. From each of the remaining three sets two 
stimuli were selected for familiarization training. 
These pairs of stimuli were selected so as to 
provide three different familiarization treatments 
distinguished by the relationship between the 
familiarization and paired-associates stimuli. 
For Ss scheduled to learn paired-associates List 
1, which included Stimuli 1 and 2 from each set, 
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the four experimental conditions and the stimuli 
employed during familiarization were as follows: 


1. Familiarization stimuli identical to paired- 
associates stimuli. For this condition Stimuli 1 
and 2 of one set were employed during famil- 
iarization training. 

2. Familiarization stimuli not identical to, but 
distinguished by same property as, paired- 
associates stimuli. For this condition Stimuli 3 
and 4 of a second set were employed during 
familiarization training. 

3. Familiarization stimuli distinguished by a 
different property than paired-associates stimuli. 
For this condition either Stimuli 1 and 3 or 
Stimuli 2 and 4 of a third set were employed 
during familiarization training, both pairs 
bearing the specified relationship to paired- 
associates Stimuli 1 and 2. Which of the two 
pairs was actually employed was determined 
randomly for each S. 

4. Control. The stimuli of a fourth set were 
omitted from familiarization training to provide 
a control condition. 


For Ss scheduled to learn paired-associates 
List 2, which included Stimuli 1 and 3 from each 
set, the corresponding stimuli for the three fa- 
miliarization conditions were 1 and 3, 2 and 4, 
and either 1 and 2, or 3 and 4, respectively. 

In preparation for the familiarization of a 
given S, the three appropriate familiarization 
packs, each containing four cards as described in 
the section on materials, were shuffled together 
so that the cards from the three packs were 
intermixed. Familiarization commenced by 
presenting the 12 cards to S one at a time. 
During this procedure E and S sat facing one 
another at a table. Between them was a screen 
10 in. high in which appeared two apertures 
through which one and then the other stimulus 
on each card were presented to S’s view. The 
two stimuli on one card were presented for 
approximately 1 sec. each, with an interval of 
approximately 1 sec. separating presentation of 
the two, and S was required after the presen- 
tation of each two stimuli to specify whether 
they were the “same” or “different.” If S’s 
response was correct, E proceeded to the next 
card; if S’s response was incorrect, he was cor- 
rected verbally by E and shown both stimuli 
together until S indicated he was ready to pro- 
ceed with the next card. After each run through 
the entire 12 cards, any four-card pack on which 
S made no errors was removed; the remaining 
packs were reshuffled and again presented in the 
above fashion. This procedure was repeated 
until S had attained one perfect run through each 
four-card pack. All 12 cards were then re- 
shuffled and again presented one ata time. The 


entire procedure was repeated until S completed 
two successive errorless runs through all 12 cards. 

Paired-associates learning —Immediately fol- 
lowing completion of familiarization training, S 
was instructed for paired-assocjates learning and 
started on that task. The S was informed that 
the stimulus items would be a series of figures, 
some of which were similar to the ones used 
during familiarization, and that the response 
items would be names of colors. The in- 
structions requested S not to select certain pairs 
for speciai attention, but, as far as possible, to 
work equally on all the pairs. 

The appropriate paired-associates list was 
presented in a modified Hull memory drum. 
Learning was by the anticipation method. The 
stimulus figures and their color-name associates 
were presented alternately in the same aperture 
for 2 sec. each. Learning continued without 
interruption until S reached a criterion of one 
errorless cycle through four successive per- 
mutations of the eight stimulus-response pairs. 
The E sat behind S. 

Design—An equal number of Ss was ran- 
domly assigned to each of the two paired- 
associates lists. In the case of each S, the 
paired-associates stimuli from the four different 
sets were assigned to four different experimental 
conditions. The condition to which the stimuli 
of a particular set were assigned varied among 
Ss and was counterbalanced in a randomized 
latin-square design. For each list four Ss were 
required to complete a 4 X 4 latin square in 
which each pair of stimuli was assigned an equal 
number of times to each experimental condition. 
A complete replication including both lists re- 
quired two such latin squares—one for each 
list—and hence consisted of eight Ss. The two 
latin squares in a replication were selected inde- 
pendently and new squares were drawn for each 
replication. Five replications, totaling 40 Ss, 
were made. 

Subjects —The Ss were 45 male students in the 
undergraduate, graduate, and professional schools 
at Yale University. They were hired through 
the University’s Student Appointment Bureau. 
Because of the failure of five Ss to complete the 
procedure in the hour provided for each session, 
the records of these Ss were rejected, leaving a 
total of 40 usable records. 


RESULTS 


Paired-associates learning was 
treated as concurrent performance on 
four different discrimination prob- 
lems: the two stimuli from one set 
constituted one problem, the two 
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stimuli from another set another 
problem, etc. The performance of a 
given S on the four problems was 
treated separately to provide data 
concerning each of the four different 
conditions of prior familiarization. 
Trials to mastery.—The criterion of 
mastery of a given set was seven 
correct responses out of eight con- 
secutive presentations—four of each 
stimulus—of the stimuli of that set. 
In other words, a maximum of one 
error or failure to anticipate was 
allowed on that pair during a complete 
cycle of the tape, in which all four 
permutations of the list were pre- 


sented. To reduce the probability 
that this criterion might be achieved 
on the basis of chance alone, the 
criterion trial was taken as the trial 
preceding a run in which the level of 
% correct was reached and maintained 
over at least two cycles of the tape. 
If two cycles were not completed 
because of termination of learning 
upon one errorless cycle through the 
entire tape, the hypothetical cycle 
beyond termination was regarded as 
errorless for purposes of applying the 
criterion. 

Because the distributions of original 
scores showed a distinct tendency for 


TABLE I 


PaAIRED-ASSOCIATES LEARNING PERFORMANCE FoLLowinc Four 
ConpiTions oF Prior FAMILIARIZATION 


(Based on analyses of variance of transformed scores) 











Trials to Correct 
mastery responses 


Correct Correct 
Ww ant plus minus 
‘ within-set within-set 


errors 


responses | responses 
errors 
| 





Transformation 





| | 
100 log (+3) | 100 log x | 100 tog (« +2) | 100 log x | 100 log x 








Total treatment F (df = 3 and 84) 

















——— 9.69** | 5.89" | 17.52% | 3.17* | . 9.65%" 
familiarization 
yo = SDt (df = 84) 
19.02 | 9.59 | 3.0 | 633 | 23.88 
Mean (N = 40) 
Identical 102.2 150.6 55.5 155.4 144.0 
Different stimuli de ; 
distinguished by 101.6 151.0 52.5 156.3 142.4 
same property so 
Distinguished by 121.4 143.0 93.4 155.9 118.9 
different property ; 
Control; no 111.9 148.8 86.2 159.4 130.4 
familiarization 




















* Significant at the 5% level. 
** Significant at the 1% level. : 
+ Based on error term for comparison of correlated means. 
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the standard deviations to be propor- 
tional to the corresponding treatment 
means, the original scores were trans- 
formed to a logarithmic scale. Let- 
ting x and x’ represent the original 
and transformed scores, respectively, 
the most satisfactory transformation 
was found to be x’ = log (x + 3). 

The results of analysis of variance 
of the transformed scores are sum- 
marized in Column 1 of Table 1. 
(Throughout Table 1 the means and 
SD’s have been multiplied by a 
factor of 100 to reduce the number of 
decimal places and facilitate inter- 
pretation.) It will be noted that the 
F value for total treatment effects is 
significant well beyond the .01 level 
of confidence and that comparisons 
among the four treatments are exactly 
as predicted. The difference between 
the first two familiarization condi- 
tions—in both of which the distin- 
guishing characteristic was the same 
during familiarization and paired- 
associates learning—is small (t = 
0.15, df = 84) and does not approach 
statistical significance. Both of these 
conditions, however, show superior 
performance to the control condition, 
in which there was no prior familiari- 
zation (t = 2.28, df = 84, p < .02). 
On the other hand, paired-associates 
performance following the third con- 
dition of familiarization—in which 
familiarization and paired-associates 
stimuli were distinguished by differ- 
ent properties—was significantly 
poorer than performance under the 
control condition (t = 2.23, df = 84, 
p < .02). The difference between the 
extreme conditions, i.e., between the 
third familiarization condition, on the 
one hand, and the first and second 
familiarization conditions, on the 
other hand, was significant beyond the 
.0001 level of confidence (¢ 2 4.51, 
df = 84). 


Response measures.—Individual re- 
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sponses to each stimulus presentation 
were classified according to the fol- 
lowing four categories: failure to 
anticipate (no response), correct re- 
sponse, within-set error, or between- 
set error. Within-set errors were 
intrusions appropriate to the highly 
similar stimulus belonging to the same 
set as the stimulus being scored; be- 
tween-set errors were intrusions from 
highly distinct stimuli belonging to 
other sets. Excluding failures to 
anticipate, 16% of all responses were 
within-set errors, 79% were correct 
responses, and 5% were between-set 
errors. 

Separate analyses of variance were 
made of number of correct responses 
and number of within-set errors and 
of the sums and differences of these 
measures. As in the case of trials to 
mastery, the original distributions 
tended to be skewed with SD’s pro- 
portional to their corresponding 
means, and logarithmic transforma- 
tions were made in each case prior to 
performing analysis of variance. Re- 
sults of analysis of variance of the 
transformed scores are summarized 
in Columns 2-5 of Table 1. The 
addition and subtraction of correct 
responses and errors for the measures 
presented in Columns 4 and 5 were 
performed on the original data, prior 
to making the transformations. 

All F values for total treatment 
effects are seen to be statistically 
significant: that for correct responses 
plus errors is significant at the .05 
level; all others, well beyond the .01 
level. The results of analyses of 
correct responses and of within-set 
errors are substantially in agreement 
with expectation: little difference is 
obtained between the first two famil- 
larization conditions; performance is 
significantly poorer under the third 
familiarization condition (t = 3.53 
and 5.37 for correct responses and 
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errors, respectively; df = 84); and 
performance under the control con- 
dition falls between these two ex- 
tremes. In one respect, however, 
discrepant results are obtained with 
the two measures. In the case of 
correct responses, the control con- 
dition differs significantly from the 
third familiarization condition (t = 
2.69, df = 84, P < .01), but does not 
differ significantly from the first two 
conditions (¢ S 1.03, df = 84), while 
in the case of errors the reverse is 
true—the control condition differs 
significantly from the first two famil- 
iarization conditions (t = 4.34, df = 
84, P < .001), but does not differ 
significantly from the third condition 
(t = 1.03, df = 84). 

This discrepancy may be clarified 
by considering the sums of correct 
responses and within-set errors, pre- 
sented in Column 4. This measure 
gives the total number of responses 


‘appropriate to a given stimulus pair 


without regard for discrimination 
between members of the pair. It 
will be seen that under the three 
familiarization conditions approxi- 
mately equal numbers of appropriate 
responses were obtained (t < 0.67, 
df = 84), whereas under the control 
condition an appreciably greater num- 
ber occurred (t 2 2.14, df = 84, P 
< .04). The above pattern of re- 
sults may be accounted for by as- 
suming (a) that learning in the present 
problem includes a presolution phase 
during which the responses appro- 
priate to a given stimulus pair are 
associated with the gross character- 
istics of that pair but are not differ- 
entially associated with the two 
stimuli of that pair, and (b) that under 
the control condition more responses 
are made during this phase than are 
made under the three familiarization 
conditions. On the basis of chance, 
approximately equal numbers of cor- 


rect responses and within-set errors 
would be expected to occur during the 
presolution period. An increase in 
the number of presolution responses 
would therefore result in an inflation 
of both the correct-response and 
within-set error measures under the 
control condition as compared with 
the three familiarization conditions, 
thus producing the apparent dis- 
crepancy in the results of these two 
measures for the control condition. 
An estimate of the “true” number of 
correct responses, i.e., the number 
obtained because of ability to dis- 
criminate the two stimuli of a pair, 
may be obtained by subtracting the 
number of errors from the number of 
correct responses. (It will be noted 
that this procedure is identical to that 
of subtracting the number wrong from 
the number right on a true-false test 
to correct for guessing.) 

Results of analysis of variance of 
correct responses minus errors (Col- 
umn 5), like those of trials to mastery, 
are in complete agreement with ex- 
pectation: no appreciable difference 
is obtained between the first two 
familiarization conditions (t = 0.31, 
df = 84); positive transfer, relative 
to the control condition, is obtained 
under both of these conditions (¢ 
= 2.25, df = 84, P < .02); and nega- 
tive transfer, relative to the control 
condition, is obtained under the third 
familiarization condition (¢t = 2.15, 


df = 84, P < .02). 


Discussion 


The experimental findings confirm the 
predictions stated at the outset and are 
interpreted as lending support to the 
observing-response formulation upon 
which the predictions were based. In 
particular, it is concluded that the same 
training procedures may lead to either 
positive or negative transfer of discrimi- 
nation training according to whether or 
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not the stimuli in the two tasks are dis- 
tinguished by the same properties. 

Current views appear to differ some- 
what in their topographical description 
of observing responses. Spence’s for- 
mulation of an observing-response theory 
(8), for example, interprets observing re- 
sponse topographically as receptor ori- 
entation. Miller and Dollard (6), on 
the other hand, give this construct a more 
general interpretation and include such a 
sequence of responses as counting, which 
seems to involve more than orientation 
of receptors. While it appears reason- 
able to assume that receptor orientation 
is invariably involved, it seems doubtful 
whether all cases of observing responses 
can be fully described in terms of re- 
ceptor orientation. Accordingly, in the 
present formulation no attempt has been 
made to specify the topographical char- 
acteristics of observing responses. In- 
stead, this construct is defined in terms 
of its functional properties as any re- 
sponse which, when made to one or the 
other of a given pair of stimulus com- 
plexes which are different, consistently 
results in distinctive stimulation from 
those two stimulus complexes. 

An important distinction should be 
noted between the theory and procedure 
described in a recent report by Rossman 
and Goss (7) and those described in the 
present report. In the former study Ss 
first learned a set of distinct verbal re- 
sponses to a set of stimuli and subse- 
quently learned a set of differential motor 
responses to the same set of stimuli. In 
the present procedure, while Ss were re- 
quired to react differentially to the 
stimuli during familiarization training, 
neither of the two overt verbal responses, 
“same” or “different,” was selectively 
associated with any particular stimulus. 
Rather, both responses were reinforced 
an equal number of times to all stimuli. 
It is assumed that implicit observing 
responses were differentially associated 
with stimuli of different sets, not that a 
different response was associated with 
each and every individual stimulus. 

In the study of Rossman and Goss it 
was found that Ss with prior verbal 


training learned the motor problem more 
rapidly than Ss without prior verbal 
training. The authors concluded that 
during motor learning the previously 
acquired verbal responses*occurred im- 
plicitly and provided distinctive stimu- 
lation not obtained by Ss in the control 
group, who lacked these verbal associ- 
ations. Following the present observing- 
response formulation, the findings of 
Rossman and Goss are interpreted some- 


what differently. It is assumed that in 


the first task observing responses were 
learned which facilitated the association 
of differential verbal responses to the 
stimuli in that task, and that the same 
observing responses, not the verbal re- 
sponses, were transferred to the second 
task and provided distinctive stimulation 
to which the motor responses in the 
second task were associated in turn. 
Certain of the findings reported by 
Rossman and Goss appear to support 
the latter interpretation. Questionnaire 
data obtained from their Ss indicated 
that Ss with verbal pretraining learned 
to look for identifying parts of the 
stimuli, but during motor learning did 
not consciously make the verbal re- 
sponses previously associated with these 
stimuli. Corroborating the questionnaire 
data is the fact that one group, instructed 
to make the verbal responses overtly dur- 
ing motor learning, did significantly worse 
than a group which did not verbalize 
overtly. The questionnaire results indi- 
cated that the 2-sec. interval allowed for 
responding did not give Ss enough time 
to think of both the verbal and motor 
responses. Considered together, their 
various results seem to be more conso- 
nant with the interpretation following 
from the present theoretical formulation 
than with the interpretation made by 
Rossman and Goss. 

The findings of the present study are 
in close agreement with those of an 
earlier study by Lashley of discrimi- 
nation learning by rats (3). Lashley’s 
method of investigation involved first 
training a rat to approach one and avoid 
the other of two figures in a jumping 
stand, and then testing the transfer of 
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this reaction to new pairs of stimuli 
comparable to those of the second and 
third familiarization conditions of the 
present experiment. As would be ex- 
pected on the basis of the present theo- 
retical formulation, Lashley’s findings 
indicated that rats successfully trans- 
ferred the discriminatory reaction only 
to pairs of test stimuli which differed in 
the same property as the training pair. 
The procedure employed by Lashley 
differed from that of the present experi- 
ment in that his procedure utilized the 
same differential overt responses during 
testing as during original training, 
whereas the present procedure did not. 

Recently, Lawrence (4,5), also studying 
rats, has obtained related findings, using 
a procedure in which the test problem 
involved learning an overt response dif- 
ferent from that learned during the 
original training problem. In_ these 
studies two stimulus properties were 
varied independently during the original 
problem in such a way that to con- 
sistently receive reinforcement the animal 
was forced to respond to only one prop- 
erty. The results of a number of test 
procedures all supported the conclusion 
that as a result of original training the 
relevant cue increased in distinctiveness 
as compared with the irrelevant cue. 

The failure of Waters (9), in a study 
of human learning, to obtain faster 
learning following familiarization with 
nonsense-syllable stimuli can be attrib- 
uted in part to the nature of his familiari- 
zation procedure, which merely required 
the reading of syllables and not discrimi- 
nation between syllables. Because no 
discrimination was required there was no 
reason for Ss to learn to attend to dif- 
ferentiating properties of the stimuli 
during familiarization and, in terms of 
the present formulation, no reason to 
expect transfer of discrimination. 


SUMMARY 


From a theoretical analysis of discrimination 
learning in terms of implicit observing responses 
it was predicted that transfer of discrimination 


training from one task to a second task would be 
positive when the stimuli employed were dis- 
tinguished by the same property in both tasks, 
and that transfer would be negative when the 
stimuli were distinguished by different proper- 
ties in the two tasks. To test these predictions, 
comparisons were made of discrimination per- 
formance during paired-associates learning fol- 
lowing three different conditions of prior famil- 
iarization training—(a) training stimuli iden- 
tical to the test stjmuli, (b) training stimuli not 
identical to the test stimuli but distinguished 
by the same property as the latter, (c) training 
stimuli not identical to the test stimuli and dis- 
tinguished by a different property than the 
latter—and a control condition without prior 
training. Forty human Ss served under the 
four conditions. 

Results of analysis of variance of the number 
of trials to mastery under the four experimental 
conditions were in complete agreement with ex- 
pectation: no difference was obtained between 
the first two familiarization conditions; under 
both of these conditions positive transfer was 
obtained relative to the control condition; and 
under the third familiarization condition nega- 
tive transfer was obtained relative to the control 
condition. 

Results obtained with frequencies of correct 
responses and intrusion errors were complicated 
by differences among the four conditions in the 
total number of responses. However, a cor- 
rected measure—number af correct responses 
minus number of errors—gave results identical 
to those obtained with trials to mastery. 
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WEIGHT SCALES FROM RATIO JUDGMENTS AND 
COMPARISONS OF EXISTENT WEIGHT SCALES! 


KATHERINE E. BAKER AND FRANK J. DUDEK 
University of Nebraska 


Developing scales from judgment 
data has been a persistent problem 
for psychologists and has resulted in 
numerous attempts to provide mean- 
ingful scales for psychological vari- 
ables. The present study provides 
additional experimental data in the 
area of weight scaling. Some im- 
portant aspects of a recently devel- 
oped and promising scaling technique 
applicable to psychological judgments 
are considered. Comparisons are 
made with previously determined 
scales, and results are related in 
terms of the experimental method- 
ologies and in terms of the rationales 
employed in the derivation of final 
scales. 

Metfessel (8), seeking an improve- 
ment over the fractionation technique, 
first suggested what has been called 
the constant sum method for reporting 
comparative judgments. Here judg- 
ments consist of dividing a given 
number of points, usually 100, be- 
tween members of paired stimuli so 
as to indicate their relative perceived 
magnitudes, i.e., their perceived ra- 
tios. Comrey (3) provided a pro- 
cedure for determining scale values for 
nine line lengths from judgments made 
in this manner. Each stimulus in the 
series was paired with every other one 
and, from the point assignments 
made, the stimuli were placed in rank 
order. Once the stimuli were ordered, 
the procedure, as Comrey developed 
it, involved determining the average 
ratio between adjacent stimuli. De- 


1 We are grateful to the University Research 
Council at the University of Nebraska for finan- 
cial assistance which made this study possible. 


293 


fining the smallest stimulus variable 
as the unit of the scale, and assigning 
it the scale value of 1.00, Comrey 
computed scale values for subsequent 
stimuli in the series by a process of 
successive multiplication. The re- 
sulting scale has a defined psycho- 
logical unit and provides scale values 
for stimuli in the series in terms of 
multiples of the unit. The scale 
values obtained were concordant with 
physical values and furnished some 
evidence for the effectiveness of the 
technique. 

The present study employs this 
basic scaling technique in the scaling 
of lifted weights and considers certain 
aspects of methodology and various 
characteristics of judgments. Two 
experiments, using the constant sum 
method, are to be reported. Experi- 
ment I provides data which, along 
with showing the relationship between 
psychological and physical scale val- 
ues, indicate judgment biases which 
are important in interpreting final 
scale values; Exp. II investigates the 
effects of changing the method of 
expressing ratio judgments. In ad- 
dition, the scales yielded in Exp. I 
and II are compared with results from 
a third experiment where the method 
of limits was used as a model for the 
fractionation technique. Compari- 
sons are made with four previously 
reported weight scales which likewise 
purport to reflect ratio judgments. 
These scales are compared as to their 
derivations both in terms of experi- 
mental procedures and in terms of the 
methods employed to convert basic 
judgment data into scale values. 
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That variations within methods affect 
resultant scales will be evident. 
While the purpose of Exp. I is 
obvious, the rationale of Exp. II 
requires some explanation. Although 
the points assigned in the constant 
sum method imply ratios between 
stimuli, it needs to be pointed out 
that there is not a linear relation 
between points assigned and ratios 
implied. That is, a given change in 
points assigned stimuli does not pro- 
duce a constant change in the implied 
ratio between stimuli. Rather, small 
differences in point assignments mean 
increasingly large differences in im- 
plied ratios as point divisions go from 
50-50 to 1-99. To recognize this 
fact appears to require a fair degree 
of mathematical sophistication on the 
part of S. It is possible, therefore, 
that methods of expressing judgments 
influence the scales obtained. An- 
other way to express judgments would 
be to indicate stimulus ratios directly. 
This task would not seem to require 
as much mathematical facility as is 
necessary to express ratios by division 
of a constant sum of points. Experi- 
ment II examines certain character- 
istics of judgments when Ss are asked 
to indicate directly the ratios ob- 
taining between paired stimuli. 


MetTHopD 


Experiment I.—Twenty graduate students in 
psychology and educational psychology were Ss. 
Each S served for two experimental sessions on 
different days. 

A boxlike table, enclosed on all except E’s 
side, was constructed to present pairs of weights 
to be judgea. Two holes were drilled in the top 
of the table and a small length of chain was 
suspended through each hole. At the top end 
of each chain was a ring and at the bottom a hook 
to which lead weights could be attached. The 
S saw only the two rings, labeled A and B, and 
lifted the rings one at a time with the same hand 
after appropriate pairs of weights were attached 
by E. In this way it was hoped that visual cues 
that might possibly lead to expectancy and set 
might be eliminated. At any rate, all cues were 
constant for all weights. The table with its 


rings was so constructed that S lifted the weights 
with the fingers of his outstretched arm. The 
S could rest his elbow and forearm on the table 
top if he desired. 

Stimuli judged were nine lead bar weights. 
Total weights, in grams, of the ring-chain-weight 
combinations were as follows: A = 108.5, B 
= 180.5, C = 249.4, D = 392.8, E = 426.6, F 
= 568.3, G = 689.8, H = 749.8, I = 919.8. 

Pairing each of the nine weights with every 
other weight required 36 stimulus pairs. How- 
ever, because of the possibility of time errors 
that might arise since stimuli could not be judged 
simultaneously, each stimulus pair was presented 
twice, once in A-B order and once in B—A order. 
Thus, in each experimental session each S made 
72 judgments. While Ss were instructed always 
to lift the stimulus at the A position first, they 
were allowed to lift stimuli of any pair as many 
times as they wished before making a final 
judgment. A random order for presentation of 
stimulus pairs was employed, except that no 
stimulus was allowed to appear on two successive 
comparisons. A check indicated that stimulus 
pairs representing large, medium, and small total 
weights were well distributed throughout the 
series of trials. 

The Ss were instructed to divide 100 points 
between the stimuli of each pair so as to indicate 
their relative magnitudes. A brief practice 
series preceded each experimental session and 
considerable care was taken to acquaint S with 
the nature of his task prior to making judgments. 

Experiment II.—Ten Ss served in two experi- 
mental sessions on separate days. Some Ss had 
served in Exp. I and were already familiar with 
the experimental task of the constant sum 
method. 

The same nine weights were presented in the 
same order as in Exp. I, using the apparatus 
described above. Instead of dividing 100 points 
between members of pairs, Ss were instructed 
to say directly how many times heavier the 
heavier weight felt in comparison with the lighter 
one. That is, all judgments took the form X 
to 1, where X was the perceived multiple of the 
lighter unit weight. 

A short orientation series of 10 judgments 
representing various kinds of weight pairs pre- 
ceded the experimental series. In the practice 
series, attempts were made to induce a set for S 
to make as precise discriminations as possible. 
Thus, if S judged the heavier weight of the first 
practice pair to be 1} times as heavy as the 
lighter one, he was casually asked, “Is that as 
close as you can judge? Could it be 14 or 14; or 
perhaps 1% or 1$?”, etc. While S was never 
informed of the absolute range of weights to be 
judged, the practice series included quite light 
stimulus pairs, quite heavy ones, and very dis- 
similar pairs. 
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TABLE 1 


Ratios oF Weicut Stimutt Computep From Pornts AssIGNED IN 
CoMPARATIVE JUDGMENTS 


(Experiment I) 












































Stimulus 
Stimulus 

A B Cc D E F G H I 
A 1.483 1.890 2.968 3.338 4.785 7.097 6.937 9.695 
B .674 1.317 1.898 2.320 3.208 4.579 4.831 7.024 
Cc .529 .759 1.505 1.555 2.357 2.925 3.538 5.349 
D 337 527 665 1.110 1.474 1.956 2.089 2.882 
E 300 431 643 901 1.320 1.688 1.949 2.692 
F .209 312 424 .678 .757 1.255 1.370 1.748 
G .141 .218 .342 511 592 797 1.107 1.315 
H .144 .207 .283 479 513 .730 .903 1.261 
I .103 .142 .187 .347 372 .572 .760 793 

RESULTS ratios between stimuli are shown in 


Experiment [? 


Derivation of scale values.—For pur- 
poses of determining scale values the 
data of all sessions were pooled. This 
meant that a total of 8,000 points was 
assigned to each pair comparison and 
the division of these 8,000 points indi- 
cated the judged ratio. The obtained 


? The data analyzed in Exp. I were collected 
by Mr. Harold Nelson and appear in an un- 
published master’s thesis at the University of 
Nebraska. 


Table 1. These ratios are derived 
merely by dividing the points assigned 
the stimuli involved in the pair com- 
parisons. 

To make use of all of the informa- 
tion available, it is assumed that Ss 
make consistent judgments and the 
ratio of A/B, for example, is indicated 
not only from the judgments involving 
these two stimuli diréctly, but also 
by the judgments of Stimulus A and 
Stimulus B relative to all other 
stimuli. Thus: A/B = A/C + B/C 


TABLE 2 


Ratios or Apjacent Stimuti, AVERAGE Ratios, AND SCALE VALUES FOR 
WEIGHTs OF THE STiImULUs SERIES 


(Experiment I) 




















Ratios Scale Values 
Stimulus 
B/A C/B D/C E/D F/E G/F H/G I/H Obt. | Meas. Diff. 
A 1.275 | 1.570 | 1.125 | 1.433 | 1.483 | .977 | 1.398 | 1.000} 1.000; — 

B 1.483 1.441 | 1.222 | 1.383 | 1.427 | 1.055 | 1.454 | 1.472 | 1.664 |—.192 

* 1.435 | 1.317 1.033 | 1.516 | 1.241 | 1.209 | 1.512 | 2.015 | 2.299 |—.284 

D 1.564 | 1.261 | 1.505 1.328 | 1.327 | 1.068 | 1.380 | 3.164 | 3.620 |—.456 

E 1.439 | 1.492 | 1.401 | 1.110 1.279 | 1.154 | 1.381 | 3.524 | 3.932 |—.408 

F 1.491 | 1.361 | 1.599 | 1.117 | 1.320 1.091 | 1.276 | 4.971 | 5.238 |—.267 

G 1.550 | 1.565 | 1.495 | 1.159 | 1.345 | 1.255 1.188 | 6.574 | 6.358 216 

H 1.436 | 1.366 | 1.694 | 1.072 | 1.423 | 1.237 | 1.107 7.154 | 6.911 243 

I 1.380 | 1.313 | 1.856 | 1.071 | 1.540 | 1.329 | 1.044 | 1.261 | 9.702 | 8.477 | 1.225 
Average ratio | 1.472 | 1.369 | 1.570 | 1.114 | 1.411 | 1.322 | 1.088 | 1.356 
Meas. ratio | 1.664 | 1.382 | 1.575 | 1.086 | 1.332 | 1.214 | 1.087 | 1.227 
Difference |—.192 |—.013 |—.005| .028| .079| .108| .001| .129 
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= A/D + B/D, etc. In this manner 
eight different estimates of each of the 
ratios between adjacent stimuli are 
obtained. Computationally this is 
achieved by dividing each ratio in 
Table 1 into the value in the same row 
in the next column tothe right. Car- 
rying out this operation results in the 
ratios shown in Table 2. 

The eight estimates for each of the 
ratios between adjacent stimuli are 
averaged to obtain a final estimate of 
a given adjacent stimulus ratio. 
These averages are shown in Table 2, 
and also included for comparison 
purposes are the ratios determined by 
physical measurement. 

Assigning the scale value 1.00 to 
the smallest stimulus in the series, 
which thereby defines this as the 
“unit” of the scale, scale values for 
remaining stimuli are computed by 
successively multipiying, each scale 
value by the average ratio this stimu- 
lus has to the stimulus immediately 
adjacent to it. Thus, A = 1.00; B 
= (A)(B/A) = (1)(1.47) = 1.47; C 
= (B)(C/B) = (1.47)(1.37) = 2.02; 
etc. Scale values obtained in this 
way are also shown in Table 2 and for 
comparative purposes there are in- 
cluded scale values as determined by 
physical measurement. 

The discrepancies between obtained 
and measured scale values indicate 
that the scale values by psychological 
and physical measurement are nearly 
thesame. In this respect results from 
this study of weight judgments par- 
allel the results Comrey (3) found for 
line judgments. In both studies the 
larger stimulus values tend to be 
overestimated. In Comrey’s study 
the longest line was 7.92 times as long 
as the unit of the physical scale, and 
the estimate for this stimulus was 
9.16. In comparison, the heaviest 
weight in this study was 8.47 times 
the unit physical weight, and the 
estimate of this value was 9.70. 


Looking more closely at the ob- 
tained scale values, it is evident that 
there is a tendency for perceived 
weight to increase less rapidly per 
unit change in physical weight at the 
low end of the scale compared to the 
heavier weights where the physical 
scale values are exceeded by the 
psychological values. Whether or not 
experimental error in judgment sam- 
ples accounts for this deviation from 
a one-to-one correspondence between 
psychological and physical scale val- 
ues needs to be determined in sub- 
sequent experimentation. 

With regard to the question of the 
possible ratio properties of the psy- 
chological scale [see Stevens (9) and 
Comrey (2) for general discussions 
of scale properties ], the relationship 
between psychological and physical 
measurement tells us little. While 
it may be encouraging if a one-to-one 
correspondence may be claimed be- 
tween a psychological scale and a 
physical scale known to have ratio 
properties, the obtained psychological 
scale requires demonstration of the 
internal consistency and/or checking 
against other criteria for ratio proper- 
ties. The present experiment does 
not provide for such tests of the 
constant sum method of scaling 
weights.® 

Alternate computational procedures 
for scale values.—It seems likely that 
overestimation of higher scale values 
in this and in Comrey’s study may 
be, in part, due to an artifact that 
arises from the computational method, 
since the process of successive multi- 
plication tends not only to preserve 
but also to exaggerate any over- 
estimates made in estimating ratios. 
It seems also that overestimates are 
more influential in determining final 
scale values than are underestimates, 

3See Guilford and Dingman (6) for an ex- 


ample of the type of test of ratio properties which 
is applicable. 
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TABLE 3 


Comparisons oF MEAsuRED VALUES AND ScALE VALUES FoR WEIGHT 
Strmutt OsTarnep BY ALTERNATE CoMPUTATIONAL PROCEDURES 








Stimulus 











Procedure 
A B Cc 
Physical Measmt. 1.00 1.66 2.30 
Ratio of stimuli to A 1.00 1.47 2.02 
Difference — —.19 | —.28 
Ratio of stimuli to E 1.00 1.48 1.98 
Difference — —.18 | —.32 





























since they do not completely “cancel 
out” as one might expect. Alternate 
computational procedures were de- 
vised which reduce the dependence of 
higher scale values upon estimates of 
the lower scale values. While theo- 
retically the procedures appear to be 
equivalent, the difference in resultant 
scale values shows they are not. 


To avoid cumulative multiplication and still 
maintain the smallest stimulus as the defined 
unit of the scale, the point assignments were 
used to compute several estimates of the ratios 
of each stimulus to the smallest stimulus, rather 
than getting several estimates of the ratios 
between adjacent stimuli. Since the smallest 
stimulus, A, is defined as the unit, this average 
ratio value is the scale value itself. These esti- 
mates are computed from the data of Table 1 
by dividing each ratio in the first column into 
each of the other ratios in the corresponding row. 
The entries in the first row of Table 1 are the 
direct estimates of B/A, C/A,...I/A. In 
the second row, dividing A/B into C/B we get 
another estimate of B/A, etc., throughout the 
table. In this manner eight estimates of each 
of the ratios B/A, C/A, . . . I1/A are obtained 
and the average of these is taken as the scale 
value, where A is the unit of the scale, as before. 

A third set of scale values was obtained from 
the same point assignment data. As will be 
shown later, the relationship of a perceived ratio 
to the physically measured ratio depends upon 
the similarity within pairs of stimuli. Thus, if 
the median stimulus (E) were taken as the unit 
of the scale, the influence of extreme dissimilarity 
between stimuli could be reduced. Conse- 
quently, scale values were also determined from 
the average ratios obtained when all stimuli were 
related to E as the arbitrary unit. Average 
ratios, found in a manner analogous to that 
described above, were obtained for A/E, B/E, 
. . . I/E and then these were multiplied by the 


constant E/A in order to get scale values that 
could be compared with previous determinations 
(i.e., with A = 1.00). 


Table 3 contains the scale values 
obtained by the alternate methods of 
computation: (a) average ratios of 
each stimulus to A, and (b) average 
ratios of each stimulus to E trans- 
formed to make A the unit of the 
scale. The two alternate methods do 
not result in as large psychological 
scale values for the larger physical 
values as does the first method, as 
shown in Table 2. Method of com- 
putation, thus, is seen to be a factor 
in what exact scale values are ob- 
tained. 

While the scale values determined 
from alternate computational pro- 
cedures are more closely related to 
physical values, evaluation of the 
methods does not depend on a cri- 
terion of approximation to the physi- 
cal scale. It does seem, however, 
that the artifact pointed out is in- 
herent in the cumulative multipli- 
cation procedure and that either of 
the alternative methods would be 
more desirable on logical grounds. 
There seems to be no basis for ob- 
jection to the latter procedures al- 
though there is for the former one, 
and, consequently, in further work 
with the constant sum method we 
have employed the computational 
procedure of finding all estimates of 
the ratio of each stimulus of a set to 
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the smallest stimulus, i.e., the “unit” 
stimulus. 

Comparison of estimated and meas- 
ured ratios —Scale values computed 
by a process of cumulative multipli- 
cation do not fully preserve the 
judgment characteristics revealed in 
the ratios between adjacent stimuli, 
even though these ratios are employed 
in deriving scale values. This fact 
is shown by a closer examination of 
the adjacent ratios included in Table 
2. 

Again, relatively close correspond- 
ence is seen between the judgment 
data and the results of physical 
measurement. Comrey found no dis- 
crepancy greater than .08 between 
measured and average estimated ra- 
tios for lengths of lines. In this 
study the largest discrepancy was of 
the magnitude of .20, although the 
median algebraic discrepancy was 
only about .01 and the median ab- 
solute discrepancy was approximately 
05. It seems likely that judgments 
involving weights where stimuli must 
be presented in succession rather than 
simultaneously, as is possible with 
lines, may be subject to certain 
unique kinds of errors. Consequently 
somewhat more variability in judged 
ratios is perhaps to be expected. 

Differences between average esti- 
mated ratios and the measured ratios 
for adjacent stimuli show a rather 
consistent trend. Ratios for stimuli 
at the “light” end of the stimulus 
continuum tend to be underestimated 
(differences between. stimuli are 
judged to be smaller than physical 
measurement indicates), while at the 
“heavy” end of the continuum the 
average ratios between adjacent stim- 
uli are overestimated.‘ The largest 


* Guilford and Dingman (6), also working 
with weights, found indication of a constant 
error very closely related if not the same as the 
bias noted here. 
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differences in size of perceived ratios 
and physically measured ratios occur 
at the two extremes of the stimulus 
continuum. Thus, as Table 2 shows, 
for the two smallest stimuli, B/A, a 
measured ratio of 1.66 was judged to 
be 1.47, while for I/H a measured 
ratio of 1.23 was estimated as 1.36. 

Comparison of estimated and meas- 
ured point assignments.—While the 
findings for average estimates quite 
clearly have their basis in the indi- 
vidual pair comparisons, these ratios 
are the result of various mathematical 
procedures which obscure certain ad- 
ditional features of judgments as 
indicated in the point assignments 
themselves. Therefore, it is necessary 
to examine the point assignment data 
to complete the analysis. 

Table 4 presents the differences 
between number of points assigned to 
the smaller stimulus in a given pair 
comparison and the number of points 
that would be assigned from physical 
values. Since the two halves of the 
table would be symmetrical, only the 
points assigned to the lighter stimulus 
are considered. A positive difference 
indicates that stimuli were judged to 
be more similar than physical meas- 
urement implies. 

Separate comparisons were made 
for the two orders in which stimuli 
were picked up, / designating that 
the lighter weight was lifted first, A 
that the heavier weight was lifted 
first. The differences in judgments 
in some respects suggest the time- 
order error frequently found in weight- 
estimation data. It should be pointed 
out that experimenta! controls were 
specifically instituted in an attempt 
to minimize an expected time-order 
effect. Although the experiment re- 
quired the weight at Position A to 
be lifted before the other, Ss were 
allowed to lift individual weights 
being compared as many times as 
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TABLE 4 


DirFERENCES IN AVERAGE Potnt AssIGNMENTS FOR JUDGMENTS AND 
Point AssiGNMENTS FROM PuysicAL MEASUREMENT 


(Judged—Measured Value) 


















































Stimulus 
Stimulus 
A B C D I F G H I 
\* 4.0 
B 
h 1.5 
] 5.0 22 
Cc 
h 3.6 1 
] 5.5 4.5 1.2 
D 
h 1.6 1.6 1.0 
l 2.8 2.7 2.2 —.5 
E 
h aa —1.9 23 —.5 
l 5 —.2 —14 —.6 7 
F 
h 2.0 —.5 —.1 —.3 —.2 
l —1.7 —4.4 —3.1 —2.4 —2.0 —1.6 
G 
h —.8 —1.2 9 —2.5 —.1 —.1 
] —1.6 —3.0 —4.8 — 4,3 —3.6 —14 —1.0 
H 
h 1.5 —1.5 —1.0 P| —1.1 —.4 l 
l —.2 —5.5 —6.5 —5.5 —5.1 —3.6 — 3 —1.9 
I 
h —2.2 —2.4 —4.6 —2.8 —4.] —.1 1.0 5 
* Values in row labeled “i” are from trials where the lighter stimulus was picked up first, in row labeled “h’’ 


from trials where the heavier stimulus was picked up first. 


necessary before making a final judg- 
ment. In spite of such a procedure, 
order of lifting appears to be a de- 
termining variable in judgments, since 
in 29 of the 36 pairs lifting the lighter 
stimulus first resulted in the larger 
absolute difference between psycho- 
logical and physical measurements. 
Table 4 shows that the direction 
and magnitude of differences in per- 
ceived and measured weight depended 
upon the type of comparison being 
made. Thus, for comparisons in- 
volving the lighter weights of the 
series, differences between weights 


were perceived to be smaller than 
physical measurement would indi- 
cate, and this was especially true 
when the lighter of the two stimuli 
was the first stimulus to be picked 
up in the pair comparison. On the 
other hand, when a comparison in- 
volved a relatively light weight to- 
gether with a heavy weight, then the 
difference between the weights was 
perceived to be larger than physical 
measurement would indicate. Again 
this tendency was especially marked 
when the lighter of the two stimuli 
was picked up first. The three heav- 
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iest weights (G, H, I) were almost 
invariably overestimated in relation 
to all lighter stimuli. Among the 
four lightest stimuli (A, B, C, D) the 
lighter stimulus of a pair was in- 
variably overestimated in relation to 
the other. 

The pattern of discrepancies is so 
systematic that it suggests a con- 
sistent perceptual or judgmental bias. 
Further evidence as to the consist- 
encies of the judgments was obtained 
by correlating the average point 
assignments made to the individual 
pair comparisons in the two experi- 
mental sessions. This correlation of 
.96 indicated that the average point 
assignments are quite reliable. 

The pattern of over- and under- 
estimations in Table 4 is the source 
of the observed trend of over- and 
underestimations in average ratios 
between adjacent stimuli. It is note- 
worthy also that these judgment 
biases are counterbalanced somewhat 
when indirect estimates of ratios in- 
volving dissimilar stimuli are averaged 
in with other indirect estimates in- 
volving similar stimuli to arrive at a 
single estimate of the ratio between 
adjacent stimuli. 


One issue that cannot be clarified from these 
data is whether the same relationship between 
over- and underestimations and similarity be- 
tween paired stimuli would appear for any range 
of magnitudes of weights chosen to be scaled, 
or whether there is some absolute “breaking 
point” between over- and underestimation which 
just happened to be covered in our series of 
weights. Likewise, it is not clear whether this 
type of perceptual bias is characteristic of judg- 
ments in general, or whether it is peculiar to 
judgments of weight. 


The differences in scale values ob- 
tained by different computational 
procedures serve to point up the fact 
that these consistent judgment biases 
do affect the final scale values in 
accordance with the type of arith- 
metical treatments carried} out on 
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point assignments. That is, theo- 
retically equivalent ways of deriving 
scales from point assignments yield 
different final results because judg- 
ment biases are weighted differently 
by each of these procedures. Pre- 
dictions made from these scales would 
not be uniformly valid because scale 
values have included in their deter- 
mination certain biases that apply 
only at certain parts of the scale or 
are obscured completely by the com- 
putational procedures involved. 

Variability of point assignments.— 
Another problem involves the vari- 
abilities of point assignments at the 
two extremes of the continuum. For 
our data the variability of point 
assignments was much less when 
judgments involved relatively similar 
stimuli (for example, when point 
divisions were between 50-50 and 
40-60) than when markedly different 
stimuli were involved (i.e., requiring 
point assignments from 20-80 to 
1-99). In the first case the SD’s 
were around three or four points, 
while in the latter case they were 
almost twice as high. 


Even if the variabilities were to be equal, 
there remains an important artifact which arises 
when point assignments are converted into ratios. 
This is due to the fact that the relation between 
points assigned and ratios implied by the point 
assignments is not linear. For example, if two 
stimuli are in the ratio 1:1.5, the point assign- 
ment reflecting this relationship would be 40-60. 
Varying the point assignments + 5 points does 
not alter this ratio greatly. Thus, 35-65 would 
imply a ratio of 1:1.86, while 45-55 would indi- 
cate a ratio of 1:1.22. However, when the ratio 
is, say, 1:9, varying point assignments by + 5 
results in considerably different ratios. Thus, 
5-95 would imply a ratio of 1:19, while 15-85 
would indicate a ratio of 1:5.67. It seems to 
follow that estimates for pairs that are quite 
dissimilar (e.g., requiring judgments reflecting 
ratios greater than 1:3) are bound to be less 
reliably determined than ratios for more similar 
stimuli. It is reasonable to conclude that scale 
values might be more valid if they were derived 
only from judgments of not too dissimilar stimuli. 








d 
c 
s 


ae af A. fH OS Oe 


— — | 











WEIGHT SCALES FROM RATIO JUDGMENTS 301 


Experiment II 


Derivation of scale values.—The 
direct estimates of ratios were first 
converted into appropriate point as- 
signments for a total of 100 points 
for each judgment. Converting ratio 
judgments to points was indicated by 
the arithmetical consideration that 
averaging ratios as if they were 
numbers tends to weight the various 
estimates differentially. For example, 
the average of estimates of 2:1, 3:1, 
and 4:1 might appear to be 3:1, but 
2:1 implies that the heavier stimulus 
is % of the total weight; 3:1 implies 
that the heavier stimulus was judged 
to be ? of the total; and 4:1 implies 
it to be # of the total. If 3, 3, and 
# are averaged, the mean is not ? as 
3:1 would imply. By retaining two 
decimals in point assignments, it was 
possible to express all ratio judgments 
quite exactly in terms of point as- 
signments. 

Ratios for all stimulus pairs were 
computed from total point assign- 
ments and eight estimates of the ratio 
of each stimulus to Stimulus A were 
determined. With A, the lightest 
weight, as the unit of the scale, the 
average of the eight estimates of a 


ratio becomes the scale value for the 
stimulus involved. 

Table 5 presents the obtained scale 
values and indicates the difference 
between psychological and physical 
scale.values. For comparisons, these 
data should be studied along with 
the data of Table 3, especially those 
scale values computed from ratios of 
the stimuli to Stimulus A. 

Scale values determined from aver- 
age ratio estimates when the ratios 
were reported directly differ con- 
siderably more from the measured 
values than do the scale values ob- 
tained by dividing a constant sum of 
100 points. This is particularly true 
for the heavier stimuli of the series, 
where psychological weight increases 
rapidly with physical weight. The 
effect of changing the way in which 
ratio judgments are expressed is to 
make the curve relating psychological 
scale values to physical measurement 
more positively accelerated and to 
change from predominantly under- 
estimation of physical values to pre- 
dominantly overestimation. 

Comparisons of ratios between ad- 
jacent stimuli.—Another comparison 
of interest involves the estimates of 


TABLE 5 
Comparison OF ScALE VALUES AND Estimates or Ratios Between ADJACENT 
StrmuLtt DETERMINED BY MEASUREMENT AND BY Direct Estimation 
oF Percetvep Ratios 























Stimulus 
Scale Values 
A B Cc D I F G H I 
Measured 1.00 1.66 2.30 3.62 3.93 5.24 6.36 6.91 8.48 
Ratios expressed 
directly 1.00 1.59 2.24 3.89 4.14 7.06 9.61 111.95 | 16.11 
Difference _- —.07 | —.06 27 21 1.82 3.25 5.04 7.63 
Ratios between 
adjacent stimuli B/A C/B D/C E/D F/E G/I H/G I/H 
Measured 1.66 1.38 5 1.09 1.33 1.21 1.09 1.23 
Ratios expressed 
directly 1.59 1.40 1.71 1.18 1.52 1.36 1.24 1.36 
Difference — .07 02 13 09 19 15 15 13 
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the ratios between adjacent stimuli 
as provided by the two methods of 
reporting judgments. While avoiding 
the use of successive multiplications 
for determining scales seems desirable, 
criticism of Comrey’s procedure does 
not invalidate the estimates of ratios 
between adjacent stimuli themselves. 
For such a comparison, the ratios 
between adjacent stimuli were de- 
rived from point assignment data and 
not simply by appropriate use of the 
scale values presented in Table 5. 
The lower part of Table 5 shows the 
resultant ratios for directly expressed 
ratio judgments, and these data 
should be compared with the data at 
the bottom of Table 2. 

Ratios between adjacent stimuli 
were overestimated more when stim- 
ulus ratios were judged directly than 
when point assignments were re- 
ported. The extent of the differences 


between obtained and measured ra- 
tios, however, appears small in com- 
parison with the degree of overesti- 
mation noted in the final scale values. 
But an overestimation of .13 for the 
ratio between I/H, for instance, 
means that, if H has a scale value 
of 12, then I will have a scale value 
of 16.32, 1.66 higher than the physi- 
cally measured ratio would imply. 
On the other hand, the same dis- 
crepancy for D/C would mean that, 
given C = 2.2, D will have a scale 
value of 3.76, only .28 higher than the 
physical measurement of the ratio 
would imply. Consistent overesti- 
mation of adjacent ratios will result 
in larger discrepancies between physi- 
cal and psychological scale values the 
farther up the weight continuum we 
go. 

Ratio estimates from individual pair 
comparisons.—Since averaged ratios 


TABLE 6 


Comparisons BETWEEN JuDGED AND Measurep Ratios For InprvipuAL Parr 
CoMPARISONS WHEN Ratios ARE Expressep (I) Direct y, 
anD (II) in Terms or Point AssIGNMENTS 


(Obtained Ratio/Measured Ratio) 














Stimulus 
(I) 
A B Cc D E F G H I 

A 1.09 1.07 1.13 1.37 1.34 1.42 1.33 1.50 
B 92 1.09 1.20 1.20 1.41 1.33 2.04 1.89 
= 93 .92 1.20 1.23 1.43 1.44 1.69 1.96 
D 88 83 83 1.15 1.28 1.53 1.60 1.73 
E 73 83 81 .87 1.13 1.33 1.53 1.76 
F 75 71 .70 78 88 1.17 1.19 1.53 
G 70 75 .69 65 75 85 1.13 1.13 
H 75 49 59 62 65 .84 88 1.12 
I .67 53 51 58 57 65 .88 89 

(II) 

A 89 83 82 85 91 1.11 1.00 1.13 
B 1.12 .96 88 98 1.02 1.20 1.16 1.38 
“ 1.20 1.05 96 91 1.03 1.06 1.16 1.45 
D 1.22 1.13 1.04 1.02 1.02 1.11 1.22 1.23 
E 1.18 1.02 1.10 98 99 1.04 1.11 1.25 
F 1.09 98 .97 98 1.01 1.03 1.04 1.08 
G 90 .83 .94 90 96 .97 1.02 99 
H 1.00 86 86 82 90 96 98 1.03 
I 88 72 69 81 80 93 1.01 97 
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do not reflect completely the char- 
acteristics of individual judgments, 
an additional analysis and comparison 
was undertaken. Each point divi- 
sion, whether gotten from direct 
estimation of ratios or not, implies 
a perceived ratio between the stimuli 
involved and physical measurement 
supplies another ratio value. The 
relationship between perceived weight 
and physically measured weight for 
each individual comparison may be 
indicated by the judged ratio divided 
by the measured ratio, which repre- 
sents the slope of the function relating 
the two types of measurements. 
Table 6 gives the slopes for individual 
pair comparisons when ratios were 
expressed directly (Exp. II) and when 
ratios were expressed by point as- 
signments (Exp. I). The upper half 
of each table gives the slopes of the 
psychological scale as a function of 
the physical scale, while the lower 
half shows the slopes of the physical 
scale as a function of the psychological 
scale. Another way to speak of these 
relationships is in terms of the distance 
between stimuli on the psychological 
scale in comparison with the distances 
on the physical scale. A value greater 
than 1.00 in the upper half of the 
tables, or less than 1.00 in the lower 
half, implies that the distance between 
stimuli on the psychological scale is 
greater than on the physical scale. 
The data of Table 6 clearly show 
that the physical distances between 
stimuli tend to be overestimated more 
as the stimuli involved in the pair 
comparisons become less and _ less 
similar. This is true whether ratios 
are reported in terms of point assign- 
ments or indicated directly. How- 
ever, when ratios were expressed 
directly, all physical distances were 
overestimated, while with point as- 
signments the more similar pairs show 
underestimation of the physical dis- 


tances. This judgmental character- 
istic is not merely dependent upon 
the absolute size of the stimuli. 
While space restrictions prevent sepa- 
rate tables of comparisons when the 
lighter stimulus was lifted first or 
second, the effects of similarity on 
ratio judgments in Exp. II, as in 
Exp. I, were found to be particularly 
marked when the lighter stimulus was 
lifted first. Thus, varying the method 
of reporting judgments does not 
eliminate the relationship between 
stimulus similarity and ratio judg- 
ments. 

Inspection of Table 6 makes clear 
also how average ratios can obscure 
various features of the original judg- 
ment data. It is possible for ratios 
in individual pair comparisons to be 
consistently overestimated and yet 
to obtain average ratios between 
stimuli that are fairly similar to 
physical measurements, or are under- 
estimates. In Table 5 the average 
ratio for B/A, for example, was an 
underestimate of the physically meas- 
ured ratio. But the upper half of 
Table 6 shows that distances between 
both A and B and all other stimuli 
were consistently overestimated. To 
get the average ratio B/A use is made 
of all possible values of B/X + A/X, 
where X is any larger stimulus in the 
series. Thus, the crucial factor de- 
termining the underestimation of the 
average B/A is the relative amounts 
of discrepancy in B/X and A/X. In 
this case the ratio B/X is generally 
underestimated more than the cor- 
responding A/X, signifying that the 
distance between B and X is over- 
estimated more than is the distance 
between A and X. Consequently 
when B/X is divided by A/X the 
resulting estimate of B/A is too small 
to be an accurate picture of the judged 
relationship. The same general anal- 
ysis may be made for all averages of 








304 KATHERINE E. BAKER AND FRANK J. DUDEK 


direct and indirect estimates of ratios 
between any stimuli of a set. Be- 
cause there appear to be different 
biases entering the judgments when 
different pairs of stimuli are employed, 
the combining of judgment data to 
obtain various indirect estimates may 
very possibly yield an average ratio 
which is actually quite far from repre- 
senting the characteristics of any of 
the individual judgments. 


DiscussION 


While certain characteristics of judg- 
ments appear with both directly ex- 
pressed ratio judgments and point as- 
signment judgments, it is notable that 
the scale values yielded by the two 
techniques are not consistent in a number 
of respects. To question the source of 
the inconsistencies appears pertinent. 
It does not seem reasonable to argue that 
the manner of expressing judgment 
affects perceptions per se, but rather it 
would appear that the task involved in 
communicating the perceptions is a 
crucial problem in evaluating judgment 
data. 

No claim is made that direct esti- 
mation of ratios is a better or worse 
technique than point divisions. Such 
evaluation must await tests of the 
predictive power of scales so derived. 
However, it is relevant to point out that 
direct estimates do make for additional 
inconvenience in computations. Also 
the difference in degree of refinement of 
discrimination which can be expressed 
by the two methods of reporting judg- 
ments deserves note. For similar stimuli 
the point assignment procedure seems to 
lead Ss to make judgments implying 
finer discriminations than are reported 
if ratios are expressed directly. While 
point assignments like 49-51 or 48-52 
(implying ratios of 1:1 1/25 and 1:1 
1/12, respectively) are not unusual, 
such fine discriminations are not ordi- 
narily reported when ratios are expressed 
directly. For judgments of dissimilar 
stimuli, restriction to integral point 
assignments allows only relatively gross 


ratio units to be expressed compared to 
direct judgments of ratios. However, 
it is possible that the precision afforded 
by the latter method is beyond what is 
meaningful with respect to discrimi- 
nation abilities of Ss. Typically, agree- 
ment among Ss for large ratios is poor. 

The two psychological scales derived 
in the present study, together with the 
fractionation method results mentioned 
earlier and four other recently reported 
(4, 5, 6) scales, make a total of seven 
psychological scales of weight currently 
available. Each scale makes some claim 
to ratio properties as defined by Stevens 
(9), since each is based on ratios as im- 
plied by judgments made by Ss. The 
notable fact about the obtained scales is 
that they do not agree with each other, 
as is seen from Fig. 1.5 Here the various 
scales have been made comparable by 
providing a common basis in veg units, a 
psychological unit first suggested by 
Harper and Stevens (7). It is our 
purpose now to examine the method- 
ologies in deriving the various scales and 
to orient the results of the two experi- 
ments just reported within the general 
framework of weight scaling. 

Three of the experiments (5, 6, 7) 
made use of the fractionation method, 
while our two experiments and two others 
(5, 6) utilized the constant sum method. 
The fractionation method requires Ss to 
designate weights that are perceived to 
be half the weight (W,) of each of a 
number of standard weights (VW). W 
represents some distance on the weight 
dimension from an implied absolute zero 
to the particular weight. Thus, W;, by 
definition, is that physical weight which 
lies, psychologically, half-way between 
zero and the standard. Given a number 


5Curves are from veg-weight equations de- 
rived by Armington’s (1) procedure developed 
for fractionation method data. For constant 
sum method data it is necessary to obtain first 
a set of W’s and W,’s before applying the pro- 
cedure. This was done by finding the physical 
weights corresponding to psychological scale 
values and halves of these values on the original 
psychological scales with the lightest weight of 
the series as the unit. Thus, the curves as 
presented differ somewhat from those reported 
in the original articles by the authors cited. 
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of W and W,, judgments, it is possible to 
generate a function relating psychological 
judgments to physical measurements of 
weights. The constant sum method, as 
has been seen, gives scale values in terms 
of perceived multiples of a chosen unit 
stimulus and the scale is derived from 
absolute ratio judgments implied by 
division of a constant sum of points. 

Experiments using fractionation.— 
Harper and Stevens (7) made the first 





' 
ee 
100 $00 1000 15800 2000 
PHYSICAL WEIGHT 
(Grams) 
Fic. 1. Weight scales from seven scaling experiments. 


attempt to derive a ratio scale for 
weights. Six W’s between 40 and 2000 
gm. were used. The S selected W; from 
a set of six comparison weights for each 
W. Each W had a different set of com- 
parison weights which were not distri- 
buted symmetrically around the physical 
half-weight, and for many of the W’s 
the range of comparison weights did not 
extend down far enough to include the 
physical W,. The resulting scale, as 
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seen in Fig. 1, differs markedly from the 
physical scale and indicated the greatest 
rate of change of psychological weight 
per unit change in physical weight of any 
of the scales shown. 

Guilford (5, 6) modified the experi- 
mental procedure by providing Ss with 
a single large set of comparison weights 
which included the physical W, for all 
W’s, as well as other weights, after noting 
that Harper and Stevens’ selection of 
comparison weights might have intro- 
duced certain series effects into the 
judgment context. Woodworth and 
Schlosberg (10) also commented upon 
the likely tendency of Ss to choose the 
middle stimulus of the comparison set, 
and Garner (4) showed that this was in 
fact what Ss did for loudness fractiona- 
tion. If itis true that in the Harper and 
Stevens’ experiment the judgment con- 
text tended to make S's choose W;’s that 
were too large, then Guilford’s procedure 
should result in a scale that is more like 
the physical scale and where psycho- 
logical units do not increase so rapidly 
for a given change in physical weight. 
Figure 1 shows that this was indeed the 
result. 

While the two studies above made use 
of the constant stimulus model in em- 
ploying the fractionation technique, an 
unpublished study in our laboratory by 
Joy® utilized the method of limits as a 
model. Other investigators apparently 
had not considered it feasible to vary 
weight stimuli continuously. Joy 
achieved this simply by having Ss lift a 
hidden container from which measured 
amounts of water could be subtracted, 
or to which measured amounts of water 
could be added. When comparison 
weights are presented in this way, the 
series effects noted above are eliminated. 
From this standpoint Joy’s procedures 
should accomplish much the same results 
as did Guilford’s. In addition, keeping 
comparison weights out of Ss view should 
help control any size-weight illusion that 
may be present if discrete weights are 
picked up from a table top. Further, 
the method of limits model permits 


6 Further information concerning Joy’s experi- 
ment may be obtained from the authors. 


presentation of weights near the crucial 
exact W, and thus avoids influencing 
judgments by preceding them with many 
contrasting stimulations. Of course con- 
stant errors of habituation and expec- 
tation must be controlled by having equal 
numbers of ascending and descending 
series. The curve for Joy’s data shows 
only a slight positive acceleration in 
relation to the physical scale. Thus, 
when fractionation is carried out under 
even more controlled conditions, a dif- 
ferent scale from either that of Guilford 
or of Harper and Stevens results. On 
logical grounds it would seem that Joy’s 
procedure should provide the most valid 
scale of the three in the sense that more 
possible sources of error have been 
controlled. 

Experiments utilizing the constant sum 
method.—Guilford (5, 6) also reports a 
scale using the usual constant sum 
method. Weights representing a range 
similar to that used by Harper and 
Stevens were scaled. The Ss divided 
points between pairs of stimuli and scale 
values were determined from these point 
assignments. The resulting scale differs 
markedly from Harper and Stevens’ 
scale, but it is not too far from the 
fractionation scale Guilford obtained by 
his improved procedure. 

While representing only half the range 
of Guilford’s study, our Exp. I data 
probably come closest to being an inde- 
pendent replication of any of the seven 
experiments cited. The scale obtained, 
as shown in Fig. 1, agrees quite closely 
with Guilford’s scale obtained under 
roughly comparable conditions. This 
may attest to the reliability with which 
such a scale may be determined. It will 
be noted that the curve for these data 
rises more sharply for heavier weights 
than does Guilford’s. This may be due 
to the fact that the former is based on a 
relatively short range. The part of the 
curve beyond a physical weight of 900 
gm. is an extrapolation from the data, 
and below this value the two curves agree 
fairly well. 

Our Exp. II represented a modification 
of the usual constant sum technique. 
Fig. 1 shows clearly the increased positive 
acceleration noted earlier as the effect of 
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change in S’s communication task. 
These data bear about the same relation 
to Guilford’s scale as do the data ob- 
tained in Exp. I but the difference is 
somewhat exaggerated. 

An experiment is reported by Guilford 
(5, 6) to illustrate how the constant sum 
method may be used to find perceived 
ratios among several stimuli presented 
simultaneously. The Ss in this experi- 
ment divided 100 points among five 
weights rather than between pairs of 
stimuli. Judgments were made, for two 
sets of five weights with one weight 
common to both sets so that ratios 
between heavy stimuli could be made to 
form a continuous scale with lighter 
stimuli. Figure 1 shows that the re- 
sulting scale differs somewhat from the 
other scales and lies intermediate to his 
scales obtained by fractionation and by 
the constant sum method with paired 
stimuli. That this scale differs from 
that for paired stimuli indicates that 
making point assignments among five 
stimuli represents a somewhat different 
task, perhaps involving additional 
sources of constant errors. 

In conclusion, it may be stated that 
the two scales derived in the present 
experiments are not completely con- 
sistent with each other nor are the other 
scales for weight consistent among them- 
selves or with our scales. At least for 
the time being, interpretations of such 
scales must take into account the scaling 
technique employed, the specific con- 
ditions of judging, and the mode of 
reporting judgments. That is, a rigor- 
ously operational approach seems indi- 
cated. If all these scales do actually 
have the ratio properties claimed, it must 
be concluded that they do not all repre- 
sent scales of the exact same variables. 
At any rate, the different scales obtained 
demand explanation and it does not seem 
likely that differences can be dismissed 
as due merely to unreliability. 

Questions may be raised concerning 
the validity of the claims for ratio proper- 
ties for the weight scales compared. 
Here there is no substitute for extensive 
investigations of the internal consistency 
and predictive value of the different 
scales. Such information is largely lack- 


ing at present and the apparently equally 
logical rationales for the techniques are, 
for the most part, all that is available for 
critical analysis. One recent study by 
Guilford and Dingman (6) does concern 
itself with this problem, however. They 
were able to show that stimuli chosen 
from the results of fractionation main- 
tain the predicted scale relationships 
when judged by the constant sum 
method. 

One last comment concerns the more 
general import of studies like the ones 
reported here. It does not seem to 
follow at this time that the variance 
among scales necessarily leads to the 
conclusion that none of them is likely 
to be a valid scale. If this is indeed the 
fact, future investigations will indicate 
their lack of any usable predictive power. 
However, to abandon such studies at this 
point might very well prevent the dis- 
covery of much important method- 
ological and factual information. Never- 
theless, a critical and rigorous attitude 
needs to be taken in evaluating scaling 
methods and the data they yield. As 
Guilford has pointed out, since the end 
results of scaling methods of the type 
considered here are supposed to achieve 
the highest level of measurement, one 
should be particularly sensitive to any 
defects the methods may have. 


SUMMARY 


To investigate various aspects of methodology 
and characteristics of ratio judgments in the 
constant sum method of scaling, two experiments 
were carried out with nine weights ranging be- 
tween 108.5 and 919.8 gm. Experiment I in- 
volved 20 Ss who divided 100 points between 
pairs of weights to express ratio judgments, while 
Exp. II had 10 Ss express their ratio judgments 
of the same pairs directly, i.e., say how many 
times heavier the one weight felt in comparison 
to the other. With appropriate computations 
the judgment data were used to derive psycho- 
logical scales for the two conditions. The main 
results of the experiments were as follows: 


1. Experiment I yielded a scale which fairly 
closely approximated the physical scale, but 
there was a tendency to overestimate heavier 
weights and to underestimate lighter weights. 
The effects of changing the way ratio judgments 
were expressed was to make the function re- 
lating physical and psychological scales more 
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positively accelerated with overestimation 2. Comrey, A.L. An operational approach to 
throughout the range of weights used. some problems in psychological measure- 

2. Successive multiplication in the procedure ment. Psychol. Rev., 1950, 57, 217-228. 
utilized in computing scale values from judgment 3. Comrey, A. L. A proposed method for 
data was shown to introduce an artifact which absolute ratio scaling. Psychometrika, 
influenced final scale values. 1950, 15, 317-325. 

3. Averaged judged ratios between adjacent 4. Garner, W. R. Context effects and the 
stimuli in the series reflected the patterns of validity of loudness scales. J. exp. 
over- and underestimation seen in the relation- Psychol., 1954, 48, 218-224. 
ships between psychological and physical scales 5. Guttrorp, J. P. Psychometric methods. 


for both methods of reporting judgments. 

4, Individual pair comparisons revealed that 
in both experiments judgments were biased by 
the similarity between stimuli within a pair and 
by whether the lighter or the heavier stimulus 
was lifted first. Overestimation of the physical 
measurements of stimuli increased with in- 
creasing dissimilarity and the effect was more 
marked if the lighter stimulus was lifted first. 


The obtained scales were compared with five 
other weight scales. It was concluded that 
different methodologies permit different sets of 
judgment determinants to operate; thus none 
of the scales agrees completely with any other 
scale. 
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THE EFFECTIVENESS OF SIZE CUES TO RELATIVE 
DISTANCE AS A FUNCTION OF LATERAL 
VISUAL SEPARATION ! 


WALTER C. GOGEL AND GEORGE S. HARKER 


Army Medical Research Laboratory 


There have been many studies 
concerned with the role of size in the 
perception of distance. Some of these 
(1, 4, 5, 6, 7, 8, 9) have compared the 
relative contribution (the effective- 
ness) of the size cue with that of 
other cues upon the perception of 
relative distance. However, the pos- 
sibility should be considered that the 
relative contribution of a distance cue 
may depend not only upon the type 
and number of other cues present, 
but also upon conditions which are 
not themselves cues to distance. 
Consider the case in which a stereopsis 
cue is introduced in opposition to a 
size cue so that the stereopsis cue 
dominates the perception. The con- 
ditions of the situation in which this 
observation occurs might be such as 
to maximize the effectiveness of the 
stereopsis cue and to minimize that 
of the size cue. Under other con- 
ditions the size cue rather than the 
stereopsis cue might be dominant. 
The effectiveness of a distance cue as 
a function of various conditions needs 
to be specified before it will be possible 
to predict the relative contribution of 
a particular distance cue in a variety 
of situations. The condition with 
which the present study is concerned 
is the amount of lateral separation 
between two similar but differently 
sized objects. The question of this 
study is whether the effectiveness of 
relative size in determining the per- 


1 The authors wish to thank John P. Tammaro 
for his assistance in collecting and analyzing the 
data, and also Kay Inaba and Robert L. Brune 
for their assistance in the analysis of the data. 


ceived relative distance of two objects 
changes as the lateral visual sepa- 
ration of these objects is changed. 
This question is investigated in the 
case in which the size cue is opposed 
by a stereopsis cue, and also in the 
case in which the stereopsis cue is 
effectively absent. 


Metuop oF MEASUREMENT 


To answer the question of this study, it is 
necessary to determine when the effectiveness of 
the relative size cue has changed. It will be 
concluded that the effectiveness of a relative size 
cue has increased when the apparent depth 
between the two objects which involve the size- 
cue changes in the direction of the perception 
expected solely from their size difference. This 
requires that a method of measurement be used 
which is sensitive to changes in the apparent 
relative distance of objects. “ By measuring the 
change in the apparent relative distance of the 
two objects as a consequence of a change in their 
lateral visual separation, any change in the 
effectiveness of the size cue as a function of 
lateral separation will be determined. A method 
of measuring the apparent relative distance of 
binocular objects has been briefly discussed 
previously (2, p. 341). The application of this 
method to the present study is illustrated by the 
schematic drawings of Fig. 1. 

In the top view drawing of Fig. la, the two 
solid horizontal lines represent the physical 
positions of two simultaneously presented objects 
A and B, which differ only in size and which are 
located at the same distance from S. If the 
position of S were indicated, it would be shown 
well below the top view drawings. As shown in 
the front view drawing of Fig. la, the inner 
edges of A and B are separated laterally by a 
distance L. The long arrows in the top view 
drawings represent the path along which a small 
binocularly viewed disc is movable in distance. 
The small disc is represented by the circle in the 
front views of Fig. 1. As indicated, the disc is 
movable in distance toward or away from S 
along a visual direction which is very similar to 
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Fic. 1. Schematic drawings of a method of 
investigating the apparent depth between two 
objects 4 and B. 


the visual direction of the left object, but is 
displaced to the left from the visual direction 
of the right object. The apparent depth be- 
tween objects 4 and B is measured by the physi- 
cal difference between the distance adjustment 
at which the disc appears equidistant with 4 
and the distance adjustment at which the disc 
appears equidistant with B. When 4 and B 
are both viewed binocularly it is important that 
the visual direction of the path of distance move- 
ment of the disc remains unchanged for the 
adjustment of the disc to apparent equidistance 
with either 4 or B. Throughout this study, the 
path of distance movement of the disc was that 
indicated by the long, arrowed lines of Fig. 1. 
The experimental justification for this method 
of measuring apparent relative distance is given 
in the previous study (2). 

Since object B in Fig. la is larger than the 
similar object 4, B may appear to S to be closer 
than 4 even though 4 and B are physically 
equidistant. Consider the case where objects 4 
and B and the disc are all viewed binocularly. 
If S adjusts the disc in distance until it appears 
equidistant with 4, the resulting physical 
distance position of the disc should be the same 
as that of 4. The physical distance position 
which 4 would have to occupy in order to be 
physically equidistant with the adjusted position 
of the disc is indicated by the horizontal dotted 


line A’ of Fig. la. (The line 4’ is slightly 
displaced from the line 4 in Fig. la for clarity 
of presentation.) If S adjusts the disc in dis- 
tance until it appears equidistant with B in the 
situation represented by Fig. la, the resulting 
physical distance of the disc may be considerably 
less than the physical distance of B from S. 
This is a consequence of B appearing to be less 
distant than 4. The physical distance position 
which B would have to occupy in order to be 
physically equidistant with the adjusted position 
of the disc is indicated by the horizontal dotted 
line B’. The position of B’ in Fig. la represents 
an instance in which the disc appears equidistant 
with B when it is physically considerably in front 
of the physical position of B. 

If B appears to S to be less distant than 4, 
B’ will be physically less distant than 4’. The 
physical distance difference between 4’ and B’ 
is taken as a measure of the apparent relative 
depth between 4 and B. It is as though, in 
adjusting the disc to apparent equidistance with 
B, the disc is adjusted in distance by the stere- 
opsis between the disc and 4 until the disc 
appears as far in front of 4 as B appears in 
front of 4. The apparent depth between 4 and 
B would not have been measured, however, if, 
following the apparent equidistance adjustment 
to A, the disc had been moved to the directional 
vicinity of B before making the apparent equi- 
distance adjustment to B. In this case, since 
A and Bare physically equidistant, the difference 
between 4’ and B’ would be approximately zero 
regardless of the apparent depth between 4 and 
B (2). 

The physical adjustment of the disc to ap- 
parent equidistance with 4 and B in the situation 
represented by Fig. la may possibly differ even 
when 4 and Barethesame size. This difference 
may be a consequence of the lateral displacement 
L between 4 and B, independent of the occur- 
rence of a size difference between 4 and B. A 
control situation is illustrated by Fig. lb. The 
situation illustrated by Fig. la will be called an 
experimental situation. The only difference 
between an experimental and control situation 
is that in a control situation object B is the same 
size as object 4, i.e., 4 and Bare identical. The 
difference between 4’ and B’ in the control 
situation can be subtracted from the difference 
between 4’ and B’ in the experimental situation. 
The difference which remains is attributed to 
the effect of the size difference between 4 and 
B in the experimental situation. 

If the disc and A are viewed binocularly 
while B is viewed monocularly, the same method 
of measuring apparent relative depth can be 
used. In this case S, in adjusting the disc to 
apparent equidistance with B, uses the stereopsis 
between 4 and the disc in order to move the 
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disc to the same apparent distance as B. The 
difference between 4’ and B’ is again a measure 
of the apparent depth between 4 and B. 


MetTuop or GENERATING 
THE Disc 


The disc was stereoscopically generated rather 
than being an actual object. The instrument 
for generating the disc was the same as that used 
in the previous study (2) except that this instru- 
ment was renovated and recalibrated. This 
stereoscopic instrument was a device for pro- 
jecting two identical light images, ont into each 
eye ofS. This was accomplished by using a pair 
of reflex sights, with one reflex sight before each 
eye of S. Each reflex sight consisted essentially 
of a circular aperture source of light, a Mangin 
mirror, a surface which simultaneously reflected 
and transmitted portions of the incident light, 
and a viewing aperture. Light from the aper- 
ture source was transformed into a beam of 
parallel light by the Mangin mirror and, after 
being reflected from the reflecting surface, 
entered S’s eye through the viewing aperture. 
This resulted in a disc of light (subtending 15’ 
of visual angle) falling on each retina with each 
disc independently generated. When S turned 
an adjustment knob, the right reflex sight rotated 
so as to change the direction at which the light 
entered the right eye. The left reflex sight re- 
mained fixed. The reflecting-transmitting sur- 
face permitted S to see simultaneously the disc 
and objects which were actually located in front 
of himself. When the two retinal disc images 
were binocularly fused, the disc appeared to be 
located in front of S at some distance with respect 
to the actual object. Turning the adjustment 
knob changed the retinal disparity between the 
binocularly fused disc and the actual object, with 
the result that the disc appeared to move in 
distance and could be adjusted by S to the 
apparent distance of the actual object. The 
instrument knob was linked with an indicator. 
From the interpupillary distance of S and the 
calibration constant of the instrument, the dis- 
tance of the disc in centimeters which corres- 
ponded to any value of the indicator, and thus 
any setting of the adjustment knob, could be 
calculated. For the purpose of this study, the 
stereoscopically generated disc may be regarded 
as though it were an actual object which S could 
move in distance toward and away from himself. 


EXPERIMENT I 


A pparatus.—Figure la also can be used to 
illustrate the experimental situations of Exp. I. 
Two “seven-of-spades” playing cards were used. 
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Object 4 (the left card) was a half-sized playing 
card (4.5 cm. by 2.9 cm.), and object B (the 
right card) was a double-sized playing card 
(18.0 cm. by 11.6 cm.). Both of these cards 
were presented at the same time in an objective 
fronto-parallel plane of S at a distance of 303 
cm. from S. As indicated by the long arrow in 
the top view of Fig. la, the direction in which the 
disc moved in distance was always such that it 
passed over (10’ of visual angle above) object 4. 
Two experimental situations were used. These 
experimental situations differed only in the 
lateral distance L separating the inner edges of 
the two cards. In one situation L was 3.8 cm. 
and in the other situation L was 22.9 cm. 

Two control situations were used, one for each 
experimental situation. These are illustrated 
by Fig. 1b. The only difference between the 
experimental and control situations was that, 
in the controls, card B was a half-sized playing 
card instead of a double-sized playing card, i.e., 
in the control situations both 4 and B were 
half-sized playing cards. The centers of the 
cards were all at the same height from the floor 
in all four situations. The S’s eyes were at the 
level of the centers of the cards. The white 
portion of each of the cards was adjusted to the 
same brightness (2.4 ft.-L. as measured with a 
Macbeth IlIluminometer). No objects were 
visible in S’s field of view except the disc and 
the two playing cards. The remainder of the 
field of view was in darkness. 

Procedure.—Eight men who had some experi- 
ence in using the stereoscopic instrument were 
used as Ss. Each S looked through the viewing 
apertures of the stereoscopic instrument and was 
instructed to adjust the disc to the same distance 
from himself as the inner edge of one of the cards. 
This edge was either the right edge of the left 
card or the left edge of the right card. For 
simplicity, the object to which the disc was ad- 
justed in apparent distance will be called the 
left card or the right card. It was not required 
that S fixate the disc oreitherofthecards. Each 
S made 16 successive adjustments of the disc to 
apparent distance equality with each card. The 
16 adjustments of the disc to apparent distance 
equality with one card in a particular situation 
was followed by 16 adjustments to the other card 
in the same situation. Following this, S was 
asked to give a verbal estimate of the apparent 
depth between the two cards. 

An experimental situation was directly pre- 
ceded or followed by its control. The lateral 
separation L between the inner edges of the two 
cards was the same in an experimental situation 
and its control. The procedure was syste- 
matically vatied between Ss with respect to 
(a) whether an experimental situation was pre- 
sented before or after its control situation, (b) 
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TABLE 1 


REsULTs IN CENTIMETERS FROM ADJUSTING THE 
Disc To APPARENT EQUIDISTANCE WITH 
Two PuysicaLty EquIDIsTANT 
Brnocutar Carps 











(N = 8) 
Left Card Right Card 
. _ Mean 
Condition Diff. 
Mean | SD | Mean} SD 
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Control 302 | 7 
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300 | 9.7} O 
303 | 7.1 








22.9-Cm. Lateral Separation 





Experimental | 300 | 9.6 | 278 | 18.4] 22 
Control 299 | 7.9 | 302 | 11.9} —3 




















the order of designating the inner edge of the 
right or left card as the object to which the disc 
was adjusted in distance, and (c) the order of 
presenting the two different amounts of lateral 
separation between the cards. 


Results —The summarized results 
in centimeters? from using the disc 
are shown in Table 1. Each of the 
means of Table 1 is an average of eight 
scores, one from each S, where each 
score is an average of 16 adjustments 
of the disc to apparent distance 
equality with a particular card in a 
particular experimental or control 
situation. 

The stereopsis cue between the two 
physically equidistant binocular cards 
would have tended to make the cards 
appear equidistant from S. Opposed 
to this binocular factor in the experi- 
mental situations is the size cue which 
would have tended to make the right 
card appear closer to S than the left 
card. When the lateral distance be- 
tween the two differently sized cards 
was 3.8 cm., the difference between 


2 A summary and analysis of the results in an 
angular measure for Exp. I and Exp. II are in- 
cluded in Army Med. Res. Lab. Rep. No. 125, 
Fort Knox, Kentucky, 1953. ' 
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the means from adjusting the disc to 
the two cards was 0 cm. (300 cm. 
minus 300 cm., in Table 1). When, 
however, the lateral distance between 
the two differently sized cards was 
22.9 cm., the mean difference was 22 
cm. (300 cm. minus 278 cm.). The 
22-cm. difference differed from the 
0-cm. difference beyond the .01 level 
of confidence (¢ = 3.8). In calcu- 
lating the ¢ values throughout this 
study, a distribution of difference 
scores was formed from the several 
distributions whose mean difference 
was being tested for significance. 
Evidently, the apparent depth be- 
tween the two differently sized cards 
increased when the lateral separation 
increased. But in order to conclude 
that the effectiveness of the size cue 
increased with the increase in lateral 
separation, it must be demonstrated 
that this increase in apparent depth 
would not have occurred in the ab- 
sence of size differences between the 
cards. For this reason, the results 
from the control situations in which 
the left and right card were the same 
size must be considered. As dis- 
cussed previously, the difference be- 
tween the two means from a control 
situation with a particular lateral 
separation should be subtracted from 
the difference between the means from 
the corresponding experimental situ- 
ation. This results (Table 1) in a 
difference of 1 cm. (0 cm. minus — 1 
cm.) for the 3.8-cm. lateral separation, 
and 25 cm. (22 cm. minus — 3 cm.) 
for the 22.9-cm. lateral separation. 
The difference of 24 cm. (25 cm. 
minus 1 cm.) between these two 
differences represents that increase in 
apparent depth between the two 
cards, with the increase in lateral 
separation, which can only be attri- 
buted to the size difference between 
the cards. This 24-cm. result was 
significantly different than zero be- 
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yond the .01 level of confidence 
(t = 3.8). This indicates that the 
effectiveness of the size cue to relative 
distance increased with the increase 
in the visual lateral separation of the 
differently sized playing cards. 

The average verbal report with the 
3.8-cm. lateral separation was that 
the left card was 20 in. behind the 
right card in the experimental situ- 
ation (SD = 40 in.), and 1 in. behind 
the right card in the control sjtuation 
(SD =7 in.). With the 22.9-cm. 
lateral separation, the average verbal 
report was that the left card was 34 
in. behind the right card in the experi- 
mental situation (SD = 56 in.), and 
2 in. behind the right card in the 
control situation (SD = 7 in.). The 
difference between the average verbal 
report from the experimental situation 
and control situation with the 22.9- 
cm. lateral separation is greater than 
the difference between the average 
verbal report from the experimental 
situation and control situation with 
the 3.8-cm. lateral separation. This 
is in agreement with the results from 
using the disc. But, unlike the 
results from using the disc, this over- 
all result is below the .05 level of 
confidence (t = 2.0). 


EXPERIMENT II 


Apparatus and procedure—Experiment II 
was identical with Exp. I in apparatus and 
procedure except that (a) a normal-sized playing 
card (9.0 cm. by 5.8 cm.) was used in the experi- 
mental and control situations wherever a half- 
sized playing card was used in Exp. I; (b) the 
playing card on S’s right was always seen 
monocularly in Exp. II, while the card on the 
left and the disc were always seen binocularly 
in Exp. II as in Exp. I; (c) 16, rather than 8, Ss 
were used in Exp. II. No objects were visible 
in S’s field of view except the disc and the two 
playing cards. 

The monocular view of the right playing card 
was produced by adjusting a black cardboard to 
obscure part of the field of view of S’s right eye. 
With his right eye S saw the disc and only the 


left playing card. The disc and both playing 
cards were visible with the left eye. The 
normal-sized card was used instead of the half- 
sized card to restrict the apparent depth differ- 
ence between the two cards. If this apparent 
depth difference were too large, S, in attempting 
to adjust the disc to distance equality with the 
right card, would have exceeded the fusion limit 
of the stereopsis between the disc and the left 
card by means of which the distance adjustment 
was made. The Ss were 16 men who were 
experienced in using the stereoscopic instrument. 
Seven of these had previously been used in Exp. 
I. In spite of reducing the size differences 
between the cards, one of the eight Ss previously 
used in Exp. I, in attempting to adjust the disc 
to equidistance with the right card in Exp. II, 
apparently exceeded his stereopsis fusion limit 
and was unable to make the adjustment. 
Except for this S, an S previously used in Exp. 
I was paired with each new S in all the orders 
in which the four situations were presented. 


Results —The summarized results 
in centimeters from using the disc are 
shown in Table 2. Each of the means 
of Table 2 is an average of 16 scores, 
one from each S, where each score is 
an average of 16 adjustments of the 
disc to apparent distance equality 
with a particular card in a particular 
experimental or control situation. 

As in Exp. I, the size difference 


TABLE 2 


Resutts tn CENTIMETERS FROM ADJUSTING THE 
Disc To APPARENT EQUIDISTANCE WITH 
Two Puysicatty Equripistant Carps, 

One BrnocuLar AND ONE 
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between the cards would make the 
right card appear closer to S than the 
left card. When the lateral distance 
between the two differently sized 
cards was 3.8 cm., the difference 
between the means from adjusting 
the disc to the two cards was 16 cm. 
(303 cm. minus 287 cm., in Table 2). 
When the lateral distance between the 
two differently sized cards was 22.9 
cm., the mean difference was 34 cm. 
(304 cm. minus 270 cm.). The 34- 
cm. difference differed from the 16- 
cm. difference beyond the .01 level 
of confidence (t = 3.6). The differ- 
ence between the two means from a 
control situation with a particular 
lateral separation should be sub- 
tracted from the difference between 
the two means from the corresponding 
experimental situation. Referring to 
Table 2, this results in a difference of 
19 cm. (16 cm. minus — 3 cm.) for 
the 3.8-cm. lateral separation, and 36 
cm. (34 cm. minus — 2 cm.) for the 
22.9-cm. lateral separation. The dif- 
ference of 17 cm. (36 cm. minus 19 
cm.) between these two differences 
represents that increase in apparent 
depth between the two cards, with 
the increase in lateral separation, 
which can be attributed only to the 
size difference between the cards. 
This 17-cm. difference was signifi- 
cantly different than zero beyond the 
Ol level of confidence (t = 3.3). 
This indicates that, in this experiment 
also, the effectiveness of the size cue 
to relative distance increased with the 
increase in visual lateral separation of 
the two cards. 

The average verbal report with the 
3.8-cm. lateral separation was that 
the left card was 19 in. behind the 
right card in the experimental situ- 
ation (SD = 30 in.) and 1 in. in 
front of the right card in the control 
situation (SD =3 in.). With the 
22.9-cm. lateral separation, the aver- 


age verbal report was that the left 
card was 27 in. behind the right card 
in the experimental situation (SD 
= 37 in.) and at the same distance 
as the right card in the control situ- 
ation (SD = 3 in.). The difference 
between the average verbal report 
from the experimental and control 
situation with the 22.9-cm. lateral 
separation is greater than the differ- 
ence between the average verbal 
report from the experimental and 
control situation with the 3.8-cm. 
lateral separation. This is in agree- 
ment with the results from using the 
disc. But, unlike the results from 
using the disc, this over-all result is 
below the .05 level of confidence 


(¢ = 1.6). 
Discussion 


In Exp. I, the size cue resulting from 
the differently sized cards was not in 
agreement with the stereopsis cue be- 
tween the cards. From the results of 
this experiment, it can be concluded that, 
with the differently sized cards, the 
relative contribution (effectiveness) of 
the size cue increased when the lateral 
separation between the cards was in- 
creased and, conversely, the relative 
contribution of the stereopsis cue de- 
creased with the increase in lateral 
separation. 

In Exp. II, the stereopsis cue between 
the two cards was effectively absent. 
It cannot be assumed, however, that in 
this experiment there were no cues which 
opposed the perceptual effects of the size 
cue. For example, the cards were of the 
same brightness, and this brightness 
factor would tend to make the cards 
appear equidistant. But to suggest that 
the brightness factor was significantly 
involved in producing the results of Exp. 
II is equivalent to concluding (paralleling 
the conclusions from Exp. I) that the 
brightness factor became perceptually 
less effective as the lateral separation of 
the differently sized cards increased. 
Rather than assuming that the effective- 
ness of the two monocular factors of size 
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and brightness are differently related to 
lateral separation, it seems reasonable to 
look for some other factor whose effec- 
tiveness may have decreased as the 
lateral separation increased. Such a 
factor is discussed in a report (3) in which 
it is quantitatively demonstrated that 
there is a tendency to see a monocular 
and a binocular object as equidistant, 
with this tendency being inversely re- 
lated to lateral separation. This report 
contains an application of this factor to 
the second experiment of the: present 
study. 


SUMMARY 


Two experiments were conducted to investi- 
gate the change in the effectiveness of a size cue 
to relative depth as a function of the lateral 
separation of two differently sized objects. In 
one experiment, the two objects (playing cards) 
were viewed binocularly, and in the other experi- 
ment the left card was viewed binocularly and 
the right card monocularly. Two different 
amounts of lateral separation of the cards were 
used in each experiment. The change in the 
effectiveness of the size cue was determined by 
measuring the change in apparent depth between 
the two cards. The apparent depth between 
the two cards was measured by having the 
subject adjust a binocular disc to apparent dis- 
tance equality with each of the cards. Controls 
were used to determine the effect of the increased 
lateral separation upon the apparent relative 
distance of the cards when both cards were the 
same size. It was found that the average ap- 
parent depth between the two differently sized 
cards increased as the lateral visual separation 
of the cards was increased. An equivalent 
change did not occur when both cards were the 
same size. This happened both when the two 
cards were binocular and when one card was 
binocular and the other monocular. It is con- 
cluded that, under the conditions of these ex- 


periments, the effectiveness of size cues to 
relative depth increased as the lateral separation 
of the differently sized cards was increased. 
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STUDIES IN THE VISUAL DISCRIMINATION 
OF MULTIPLE-UNIT DISPLAYS! 


GILBERT K. KRULEE* AND ALEXANDER WEISZ 
Tufts College 


In some previously reported ex- 
periments (2), results were obtained 
that were inconsistent with the hy- 
pothesis in terms of which the experi- 
ments had initially been planned. 
The experiments to be described in 
this report were undertaken in an 
attempt to explain the unanticipated 
findings. In these earlier experiments 
the hypothesis had been advanced 
that the magnitude of the threshold 
for the discrimination of numerals 
would increase as the amount of in- 
formation transmitted (number of 
altenative possibilities) was in- 
creased. The results indicated that 
the hypothesis is correct only in part. 
While the magnitude of the threshold 
increased significantly as the amount 
of information was varied within the 
limited range of 2 to 10 categories, no 
further increases in threshold were 
obtained as the number of categories 
was increased to 16, 32, 100, and 1,000 
categories. 

In attempting to derive an alterna- 
tive explanation for the previously 
obtained data, it is of interest to note 
that the original hypothesis was con- 
firmed for those conditions involving 
single-unit displays. The failure to 
obtain the anticipated threshold in- 
creases was encountered only for 


1The experiments upon which this report is 
based were supported by a contract between the 
Office of Naval Research and Tufts College and 
were undertaken as part of a program of research 
on the capabilities of humans as processors 
of information. The authors wish to express 
their appreciation to Rachel Gordon, Walter 
Silvester, Paul Perney, and Phylis Epstein for 
their assistance in the prosecution of this work. 

2 Now located at Case Institute of Technology. 


conditions involving the use of mul- 
tiple-unit displays. 

It is significant to note that the 
conventional number system is based 


‘ upon the use of 10 elemental symbols. 


As long as one is restricted to the use 
of single-unit displays, no more than 
10 categories are possible. Flexi- 
bility in the use of these elemental 
symbols is gained by the introduction 
of multiple-digit symbols such that, 
by suitable combinations of the basic 
10 elements, an unlimited number of 
possibilities can be defined. How- 
ever, increases in the amount of in- 
formation that can be transmitted are 
obtained not by increasing the amount 
of information that can be conveyed 
with a single-digit position but by 
increasing the number of digits that 
are included in the display. 

This combinatorial aspect of the 
Arabic system of numerals suggests 
an explanation for the previously ob- 
tained findings. It is possible that, 
with respect to multiple-digit displays, 
the perceptual decisions concerning 
the several positions of the display 
can be made independently and con- 
currently. If so, it would follow that 
the magnitude of the threshold would 
be primarily a function of that digit 
position which presented S with the 
most difficult discrimination. Alter- 
natively, it would follow that vari- 
ations in threshold would be obtained 
as long as the number of possibilities 
in the most complex position was 
being manipulated. Having reached 
the limit of 10 categories in any given 
position, no further increase in thresh- 
old would be expected from com- 
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pounding the amount of information 
in that position with information 
contained in any remaining positions 
of the total display. 


METHOD 


The experimental apparatus used permitted 
the display of numbers such that a distance 
threshold for discrimination of the numerals 
could be determined. Details of this apparatus 
were contained in a previously published report 
(2). The apparatus consisted printarily of an 
enclosed box, the interior of which could be 
scanned by S through a fixed eyepiece. The 
farther end of the box was enclosed by a frosted- 
glass screen, employed to decrease the clarity of 
contours of objects viewed through it. Behind 
the screen was a movable carriage upon which 
the number displays were mounted. The car- 
riage was driven by a constant-speed electric 
motor, control of which was given to S. The 
movement of the display during a trial was 
always towards S and along his line of sight. 
The contours of the numbers were seen with 
greater clarity by S as the carriage advanced 
toward the frosted-glass screen. At the be- 
ginning of each trial, the display was positioned 
at a fixed starting point so chosen that S could 
not discriminate the numeral displayed at that 
distance. The threshold value, defined as the 
amount of travel required from the initia! 
starting point before recognition was possible, 
was measured by means of a counter which 
recorded the movement of the drive shaft. 
Subsequently, these threshold values were con- 
verted into distance in inches from the eye of 
S, and are so recorded in the appropriate tables 
of results. 

After E had prepared the display for a given 
trial, S was free to start the motor at his own 
convenience. The instructions to S were to 
position his head by means of the eyepiece; he 
was to continue without interruption the move- 
ment of the carriage until he could recognize 
the number displayed. When such a point was 
reached, he was to release the switch, thus 
stopping the motor, and give his response 
verbally. Instructions to S emphasized ac- 
curacy although he was also advised to give his 
response as soon as he was reasonably certain 
of his decision. No knowledge of results as to 
threshold values or accuracy was given to Ss 
during the experiments. However, threshold 
values for inaccurate responses were always 
replaced by the value obtained from an addi- 
tional presentation. 


EXPERIMENT I 


Purpose.—In this first experiment, 
the amount of information displayed 
was held constant, but the coding of 
the categories was manipulated so as 
to vary the distribution of information 
amongst the three possible positions 
in the display. The hypothesis ad- 
vanced was that the distance thresh- 
old would vary directly with the 
number of categories possible in the 
most complex position of the display. 


Method.—The Ss in both this and the fol- 
lowing experiment were graduate students and 
employees of Tufts College. They were selected 
without reference to any particular qualifica- 
tions. 

In each of the experimental variations, the 
amount of information displayed was held con- 
stant at a value of eight categories. However, 
this amount of information was coded in any 
one of three ways using either a 1-, 2-, or 3-digit 
code. With the 1-digit code, all the uncertainty 
was contained in a single position, as with a set 
of numbers of the form 11, 12, 13,..., 18. 
With such a coding of eight categories, there is 
no uncertainty in the 10’s position, while all the 
uncertainty is contained in the digits position. 
The second code used a set of numbers of the 
form 11, 12, 13, 14, 21, 22, 23, 24. In this case, 
there are four possible categories contained in 
the digits position and only two in the 10’s 
position. The third code for eight categories 
made use of a series of binary choices and had 
the form 111, 112, 121, 211, 122, 212, 221, 222. 
With this code, equal amounts of information 
are contained in each of the three positions. In 
terms of the previously stated hypothesis, the 
threshold for the first code should be highest, 
with the value for the second having an inter- 
mediate value, and the value for the 3-digit 
code being the lowest of the three. 

An additional variation was introduced in 
order to test the influence of the relative ease of 
discrimination of particular numerals. Three 
different sets of 1-, 2-, and 3-digit codes were 
constructed using different elemental symbols 
as a basis for the sets of categories. The nine 
sets of eight categories thus constructed are 
summarized in Table 1. 

As for the actual presentations, a display was 
used such that any 3-digit number could be 
presented. The numerals used were those de- 
signed by Berger (1) and the height of the 
numerals chosen for use was l in. Each S read 
numbers under all nine conditions. The experi- 
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TABLE 1 
Conpitions SELECTED FOR 
PRESENTATION 
Code 
1-Digit 2-Digit 3-Digit 





11, 12, 13, 14, | 11, 12, 13, 14, | 111,112,121, 211, 
15, 16, 17, 18. | 21, 22, 23, 24. | 122, 212, 221, 222. 
31, 32, 33, 34, | 33, 34, 35, 36, | 333, 335, 353, 533, 
35, 36, 37, 38. | 43, 44, 45, 46. | 355, 535, 553, 555. 








1 21, 22, 23, 24, | 15, 16, 17, 18, | 222, 227, 272, 722. 
25, 26, 27, 28. | 25, 26, 27, 28. | 277, 727, 772, 777. 














mental conditions were administered to each S 
in three separate periods, separated by an inter- 
val of one day’s duration. A given set of three 
conditions was administered during a particular 
period. Nine Ss were used and rotation of 
conditions was introduced in order to control for 
possible practice effects. This design was 
balanced with respect to positional effects both 
within and between periods. Within each con- 
dition, Ss were given a sequence of eight pre- 
sentations chosen randomly with replacement 
from the set of possible categories. Different 
sequences were chosen for each S. Moreover, 
all Ss operated within a given condition with 
full knowledge of the set of possible alternatives. 
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Results —The results are summa- 
rized in Table 2. Throughout the 
analysis, the average of the scores for 
the eight presentations within a given 
condition was computed for a given 
S under the nine experimental con- 
ditions and the ¢ test was applied to 
the paired differences. For those hy- 
potheses in which the direction of the 
paired differences had been initially 
specified, one-tailed tests of signifi- 
cance were employed. With respect 
to other tests of significance, such as 
for the comparison of two specific 1- 
digit codes, for which the direction of 
the difference had not been predicted, 
two-tailed tests of significance were 
employed. 

For the sets of categories labeled 
“T” and “II,” the results are con- 
sistent with the initial hypothesis. 
Both l-digit codes have significantly 
higher thresholds than the related 3- 
digit codes. With respect to these 
two sets of categories, there is a pro- 
gression of difficulty from the 3-digit 
through the I1-digit codes, although 












































TABLE 2 
VARIATIONS IN THRESHOLD AS A FUNCTION OF THE CoDING OF THE INFORMATION 
I II Ill 
Conditions — 
Mean SD Mean SD Mean SD 
A. 1-digit code 40.14 095 40.16 .142 40.14 116 
B. 2-digit code 40.20 .140 40.23 .092 40.21 113 
C. 3-digit code 40.30 113 40.24 -156 40.07 .097 
Analysis of Paired Differences 
Conditions t Conditions t Conditions t 
Ip-la 06 1.80 IIp-II, 07 2.37” IIIp-IITa .07 2.42** 
Ic—Ip 10 3.66* IIc-IIp 01 Al IIIe-IIIp —.14 4.80* 
Iola 16 7.78* IIe-IIa 08 8.55* IIe-III,  —.07 31 
Conditions t Conditions i t Conditions t 
In-IIa — .02 .70 Ip-IIp — .03 59 Io-IIc .06 1.65 
II,-II I, .02 47 IIp-II Ip 02 95 Ice-IIIc Re 5.29* 




















* Significant at .01 level of confidence, based on 8 df. 
** Significant at .05 level of confidence, based on 8 df. 
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not all of the obtained differences are 
statistically significant. With respect 
to the third set of codes, the difference 
between the I- and 2-digit codes is in 
the expected direction and significant 
at the .05 level. However, the differ- 
ence between the 2- and 3-digit codes 
is in a direction opposite to that 
predicted. 

A possible explanation for the 
anomalous results obtained for the 
third set of codes is suggested by the 
results for variations in the specific 
symbols used with the type of code 
held constant. It is apparent from 
the results that the discrimination 
between “2” and “7” is significantly 
more difficult than the other binary 
choices which Ss were asked to make. 
Thus, with respect to the third set of 
codes, it seems reasonable to infer that 
the advantage to be gained from use 
of a 3-position code was in large part 
offset by the difficulty of the particular 
binary choice which was selected for 
use with that code. 


EXPERIMENT II 


Purpose.—A second inference that 
can be drawn from the initially stated 
hypothesis is that the threshold for 
2- or 3-digit numbers should be the 
same as that obtained for 1-digit 
numbers, provided that the discrimi- 
nation required in the 10’s or 100’s 
position does not exceed in complexity 
the discrimination required in the 
digits position. This inference is born 
out by the previously reported data 
(2) for 10, 100, and 1,000 categories. 
Similarly, if the discrimination re- 
quired at each position in a multiple- 
unit display is limited to a binary 
choice, then the threshold obtained 
for a 3-digit binary number should not 
exceed that obtained for the same 
binary choice when confined to a 
single position display. 


Method.—F or this experiment, six alternative 
pairs of numbers were selected and used to 
define 1-digit and 3-digit binary numbers. For 
example, one such pair of sets was constructed 
from the symbols 5 and 9. For the 1-digit 
display the admissible alternatives were 5 or 9. 
For the associated 3-digit display, the set of 
possibilities was 555, 559, 595, 955, 599, 959, 
995, 999. Alternative pairs of sets were con- 
structed in a similar fashion. These sets are 
shown in Column 1 of Table 3. For a pair of 
sets constructed from the same binary dis- 
crimination, the 3-digit discrimination conveys 
more information than the I-digit discrimi- 
nation: eight categories vs. two. However, if 
one considers the amount of information con- 
tained in the most complex position of either 
set, then the displays are equivalent since the 
discrimination required of each position is held 
constant at a value of two categories. 

The apparatus described previously was 
suitable for use without modification in this 
experiment. All Ss read numbers under each 
of the 12 experimental conditions. These were 
administered in two separate periods, in each 
of which six conditions were administered. 
Twelve Ss were used and rotation of conditions 
was introduced in order to control for possible 
practice effects. The actual rotations used 
made it possible to balance both with respect to 
within-period and  between-period effects. 
Within each condition, Ss were given a sequence 
of eight presentations chosen randomly with 
replacement. Different sequences were chosen 
for each S. As before, all Ss operated within a 
given condition with full knowledge of the set 
of possible alternatives. 


Results —The results for the com- 
parison of l-digit vs. 3-digit binary 
numbers are summarized in Table 3. 
As before, the analysis was based upon 
paired differences for each S. Hy- 
potheses were tested by means of the 
t test, using two-tailed tests of sig- 
nificance in all cases. For five out 
of the six comparisons the results are 
as expected in that the obtained 
thresholds are not significantly dif- 
ferent from each other. No expla- 
nation has been advanced for the one 
significant difference obtained, al- 
though, by chance results alone, it 
could be expected with a probability 
of approximately .20 that one out of 
six ¢ tests would reach significance. 
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TABLE 3 


Distance TuresHoip: 1-Dicir vs. 3-Dicir 
Binary CHOICE 














1-Digit 3-Digit 
Categories t 
Mean SD Mean SD 
A. 1 & 8 | 40.40 | .236 | 40.37) .206 | .35 
B. 1 & 2 | 40.32 | .139 | 40.33 | .169 | .38 
C. 0 & 3 | 40.32 | .096 | 40.24} .118 | 3.15* 
D. 5 & 9 | 40.22 | .113 | 40.20} .193 a 
E. 4 & 6 | 40.20} .140 | 40.25 | .128 | 1.35 
F. 2 & 7 | 40.11 | .140 | 40.08 .130 | .72 




















* Significant at .01 level of confidence, based on 11 df. 


Results for a comparison of the 
relative difficulties of each pair of 
symbols are contained in Table 4. 
This comparison of relative difficulty 
of discrimination was carried out 
separately for the six l-digit displays 
and for the six 3-digit displays. 

It is apparent from these results 
that there are significant differences 
between pairs with respect to the 
difficulty of the discrimination con- 
fronting the Ss. It is also apparent 
that the set of rank orderings of the 
six codes obtained for the Il-digit 
displays is not identical with that 
obtained for the 3-digit displays. Al- 
though those sets yielding the extreme 
values of threshold maintain their 
relative positions in both rank order- 
ings, there are obtained shifts in rela- 
tive difficulty for certain other sets. 


TABLE 4 


DisTaANCE THRESHOLD AS A FUNCTION OF 
Cuoice oF CATEGORIES 











1-Digit 3-Digit 
Paired t Paired t 
Differences Differences 





A-B_ .08 1.12 A-B .04 | 1.45 
B-C_.00 33 B-E .08 |2.12** 
C-D_ 10 iz? | Eft Mi 40 
D-E 02 4l C-D _ .04 81 
E-F .09 3.46* | D-F  .12 | 2.44** 














* Significant at .01 level of confidence, based on 11 df. 
** Significant at .05 level of confidence, based on 11 


df. 


In particular, set “E” which has a 
relatively high threshold in the 1-digit 
series, has a relatively lower threshold 
as a 3-digit display. Although no 
explanation for this shift in rank 
orderings can be advanced at this 
time, it is possible that these results 
are attributable in part to the per- 
ceptual differences between the task 
of distinguishing a single symbol in 
isolation as compared to that of dis- 
tinguishing several symbols in close 
proximity to each other. 


EXPERIMENT III 


Purpose—In the previously re- 
ported experiments (2), the highest 
threshold obtained has been that for 
10 categories. The foregoing analysis, 
however, would suggest that this 
result is an artifact of the condition 
that the system of Arabic numbers 
contains only 10 elemental symbols. 
With a set of symbols containing a 
number of elements in excess of 10, it 
could be predicted that the threshold 
would increase as long as increases in 
number of possible categories were 
accomplished by use of a single-unit 
display. 


Method.—By use of an augmented alphabet 
containing 32 elements, it was possible to de- 
termine distance thresholds for conditions of 4, 
8, 16, and 32 categories, under circumstances 
such that only single-unit displays were used. 
The augmented alphabet was composed of the 
26 conventional letters plus the numbers 2 
through 7. For the purpose of this experiment 
an alphabet of Gothic letters without serifs was 
used. These letters were 1 in. in height and 
had a stroke-width ratio of 5:1. Since the 
Berger numerals differ in stroke-width ratio from 
this alphabet, a set of numerals having similar 
characteristics was designed for inclusion with 
this alphabet. In addition a minor modification 
to the experimental apparatus had to be under- 
taken. The original carriage was replaced with 
one more suitable for presentation of selections 
from this 32-letter alphabet. This particular 
display held one symbol at a time locked into 
place behind a clear glass window. 

In this experiment, Ss read symbols under 
each of the four conditions. The Ss in both 
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TABLE 5 
Distance THRESHOLD vs. INFORMATION 
PROCESSED 
Number of . Paired 
Categories Mean SD Differences $ 





A. 4 41.22 .260 
B. 8 41.05 182 A-B .17 | 3.38* 
C. 16 41.04 .196 B-C .01 | .21 
D. 32 40.99 415 D-C .05 | 1.43 














* Significant at .01 level of confidence, based on 15 df. 


this and the final experiment to bé described 
were naval enlisted personnel (male) obtained 
through the cooperation of the United States 
Naval Receiving Station, Boston, Massachu- 
setts. Sixteen Ss were used. Within each 
condition Ss were given a sequence of eight 
randomly chosen selections with replacement. 
Different sequences were chosen for each S and 
all presentations were made only after S had 
been instructed fully as to the set of possible 
alternatives from which presentations were being 
made. 

Rotation of conditions was introduced in 
order to control for possible practice effects. In 
addition, consideration was given to ensure that 
any differences in the relative ease of discrimi- 
nation of particular symbols would be controlled. 
This was accomplished by a design which was 
balanced as to the frequency of occurrence of 
any specific symbol. This scheme of balancing 
has the characteristic that, for each S, the sets 
of symbols chosen for 4, 8, and 16 categories 
have no members in common. However, for 
the condition of 32 categories, the set of symbols 
does contain elements in common with those 
contained in the other sets. 


Results —The results for the com- 
parison of thresholds for each con- 
dition are summarized in Table 5. 
As before, the analysis was based upon 
the paired differences of the mean 
scores obtained for each S under the 
four experimental conditions. It is 
apparent that, although a significant 
increase in threshold is obtained for 
eight as compared to four categories, 
no further significant increases are 
obtained for 16 and 32 categories. 

Since the trend of these results was 
in the expected direction of increasing 
thresholds with increasing amounts of 
information, an analysis of variance 
procedure was carried out on these 


data. The procedure used for the 
analysis of variance was a method de- 
signed for the analysis of a series of 
latin squares (3). In this case the 
experimental data could be summa- 
rized in terms of four identical latin 
squares. In terms of this analysis 
procedure, significant results are ob- 
tained for the influence of amount of 
information on the magnitude of the 
obtained thresholds (F = 9.21, sig- 
nificant at the .01 level of confidence 


with 3 and 27 df.). 


EXPERIMENT IV 


Purpose—This final experiment 
was carried out in order to verify a 
possible explanation for the results 
obtained in Exp. III. One particular 
consequence of the method followed 
in that experiment for controlling the 
frequency with which particular sym- 
bols appear is that any given S ex- 
periences primarily sets of categories 
which do not contain any specific 
symbols in common. For each S the 
sets of symbols for 4,°8, and 16 cate- 
gories have no members in common. 
It is only for the condition of 32 cate- 
gories that common symbols are once 
again permitted. From the data of 
Exp. I and I], it is clear that certain 
sets of of symbols are more difficult 
than others to discriminate. Thus, 
the difficulty of a discrimination is 
apparently a function of at least two 
variables: amount of information and 
the inherent perceptual difficulty of 
the symbol discriminations. The 
complex design followed in Exp. III 
introduces for any given S variations 
due to the perceptual difficulties of 
symbol discriminations. It is con- 
ceivable that these variations tend to 
obscure the effect of amount of infor- 
mation and thus to interfere with the 
main purpose of the experiment. A 
method for validating this point of 
view would be to ensure that, for 
conditions of increasing amounts of 
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information, the previously used set 
of symbols would always be imbedded 
in a larger set of possibilities. With 
such a design, it should be possible to 
obtain increases in threshold with each 
increase in the size of the set of ele- 
mental symbols to be discriminated. 


Method.—Using the same alphabet and ap- 
paratus as described in Exp. III, thresholds 
were determined for conditions of 8, 16, and 32 
categories. Eight Ss were administered con- 
ditions of 8 and 16 categories, while a separate 
group of eight Ss received conditions of 16 and 
32 categories. For each group of Ss rotations 
of the form AB, BA were introduced so as to 
control for possible practice effects. The only 
difference from the conditions of Exp. II] was 
in the manner with which the sequence of pre- 
sentations was chosen. 

With respect to the conditions for 8 vs. 16 
categories, different sets of presentations were 
defined for each S in the following manner. 
For 8 categories, the possibilities were limited 
to the set of elements A-H. Eight presen- 
tations were chosen from this set with replace- 
ment, except for the limitation that at least six 
different elements appear in the sequence of 
presentations. For the condition of 16 cate- 
gories, the possibilities were limited to the set 
of elements A-P and a sequence of 16 presen- 
tations was given to each S. Of these pre- 
sentations, eight elements in the sequence were 
identical with the sequence of presentations 
chosen from amongst the eight categories. The 
other elements in the sequence were chosen 
randomly with replacement from the set I-P 
plus any letters in the set A-H which had not 
previously appeared in the sequence drawn for 
the condition of eight categories. However, 
the 16 elements in the resulting sequence were 
scrambled so as to obscure the fact that it was 
in any way related to the previous sequence. 

A similar method was followed in choosing 
the sequences for the conditions of 16 vs. 32 
categories. For each S a sequence of 16 se- 
lections was first drawn from the set A-P with 
replacement except for the limitation that at 
least 13 different elements must appear in the 
sequence. For the condition of 32 categories, 
the possibilities included all elements in the 
augmented alphabet and a sequence of 32 
elements was presented to S. Of these pre- 
sentations, 16 elements were identical with 
those previously presented to S as the sequence 
for 16 categories. The other 16 elements were 
chosen from the set Q-Z, 2-7, plus any letters 
in the set A-P which had not previously ap- 


TABLE 6 


DistANCE THRESHOLD vs. INFORMATION 
PROCESSED 








Number of 


. Paired 
Categories Mean SD $ 


Differences 





A. 8 | 41.12] .163 
A-B_ .11 | 3.83* 
B. 16 | 41.01 | .172 





C. 16 | 41.01} .120 
C-D .11 | 3.59% 
D. 32 | 40.90} .109 

















* Significant at .01 level of confidence, based on 7 df. 


peared. Different sequences were chosen in 
this manner for each S. As before, the 32 
elements in the sequence were scrambled so as 
to obscure any relationship to the previously 
obtained sequence. For both parts of the ex- 
periment, full knowledge was always given to 
S concerning the set of categories from which the 
sequence of choices had been made. 


Results —The results are summa- 
rized in Table 6. For each part of 
the experiment, the analysis always 
involved a comparison of the thresh- 
old for a specific set of elements when 
obtained (a) in isolation, and (b) as a 
subset imbedded in a larger set of 
categories. For example, a particular 
S received eight presentations under 
the condition of eight alternative cate- 
gories. He also received a set of 16 
presentations which included the eight 
he had received under the simpler 
condition plus eight chosen from the 
larger set of 16 possibilities. The 
average of the eight scores in isolation 
was determined and compared to the 
average for the identical set of eight 
presentations when imbedded in the 
larger set of alternatives. The val- 
ues obtained for any elements in the 
sequence for 16 categories that had 
not appeared in the sequence for the 
condition of eight categories were dis- 
carded. A similar procedure was fol- 
lowed in carrying out the comparison 
of 16 vs. 32 categories. As is ap- 
parent from inspection of the data, 
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significant increases were obtained for 
8 vs. 16 and 16 vs. 32 categories and 
the hypothesis relating the magnitude 
of the distance threshold to the num- 
ber of possible alternatives was con- 
firmed for a display limited to the 
presentation of single-unit symbols. 


WVISCUSSION 


In this series of experiments, two 
general hypotheses have been examined, 
both of which relate to the visual dis- 
crimination of letters and numerals. 
The first hypothesis advanced was that, 
in single-unit displays, the magnitude of 
a distance threshold for the recognition 
of elemental symbols is directly related 
to the amount of information involved 
in the recognition of that particular 
symbol. In some previously reported 
experiments (2) this hypothesis was 
verified for numerals within the limited 
range of two to 10 alternative possi- 
bilities. As long as discrimination of 
numerals was involved, extension of the 
range of applicability of this hypothesis 
could not be undertaken, since the con- 
ventional set of Arabic numerals contains 
only 10 elemental symbols suitable for 
use with a 1-digit display. Further ex- 
perimentation has been undertaken with 
a 32-letter alphabet which in part con- 
firms the hypothesis and in part suggests 
the advisability of some revision. 

With this alphabet, increases in thresh- 
old were obtained for 8 vs. 16 and 16 vs. 
32 categories provided that the compari- 
son was based upon the discrimination of 
a set of elemental symbols (a) in iso- 
lation, and (4) as part of a larger set of 
alternatives. From data obtained when 
such restrictions were not imposed, it 
was apparent that this hypothesis does 
not necessarily hold. For example, it 
can be inferred from the data that not 
every set of eight categories has a lower 
threshold than every set of 16 cate- 
gories, provided that the latter set does 
not include as members the specific 
symbols included in the former. In ad- 
dition, data demonstrating that some 
two alternative discriminations are more 


difficult than others, supports the view 
that the difficulty of discrimination is 
only in part a function of number of 
alternative categories. Threshold values 
also depend upon the difficulty of the 
discriminations amongst the specific 
symbols chosen to define a particular set 
of alternatives. 

The second hypothesis considered was 
that the recognition of multiple-unit 
displays involves independent and con- 
current processes such that the distance 
threshold is primarily a function of that 
position in the display containing the 
most difficult discrimination. This hy- 
pothesis was suggested by previously 
obtained data (2) which indicated that 
no difference in threshold can be de- 
tected for 1- vs. 2- vs. 3-digit numbers, 
provided that each position contained 
the same number (10) of alternative 
possibilities. Data were obtained for 
conditions in which eight categories were 
displayed in terms of 1-digit, 2-digit, and 
3-digit codes. For two of the three sets 
of codes thus defined, the results did 
indicate decreases in threshold for the 1- 
vs. 2- vs. 3-digit codes. However, for 
the third set of codes, the difficulty of 
discriminating between the symbols em- 
ployed for the binary choice in the 3- 
digit codes apparently offset any gain 
resulting from reduction of the number 
of possible alternatives at each position 
in the 3-digit display. 

Additional data relevant to this hy- 
pothesis were also obtained from the 
thresholds for several binary choices used 
both as 1-digit and as 3-digit displays. 
In five out of six such comparisons, the 
3-digit discrimination was not signifi- 
cantly different from the related 1-digit 
discrimination. No explanation has been 
advanced for the fact that for the code 
based upon the symbols “0” and “3” 
the 3-digit threshold was significantly 
greater than that obtained for the 1-digit 
choice. 

One clear-cut implication of all of these 
findings is that hypotheses concerning 
distance thresholds must involve not only 
the number of alternative possibilities 
but also the specific difficulty of dis- 
criminating among the elements chosen 
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to define a set of possible alternatives. 
For purposes of prediction, number of 
alternatives is a relatively unambiguous 
concept having clear-cut quantitative 
properties. However, the difficulty of 
choices among specific elemental symbols 
is most easily defined on operational 
grounds and not on the basis of theo- 
retical (a priori) considerations. It 
would appear to be desirable if a broader 
theory of the visual discrimination of 
symbols could be developed which would 
include both types of variables and which 
would increase the predictive value of 
the existing theories of visual displays. 


SUMMARY 


Four experiments have been reported on the 
determination of distance thresholds for 1-, 2-, 
and 3-digit displays as a function of the number 
of alternative possibilities in each position of the 
display. In the first experiment, eight cate- 
gories were defined in terms of 1-, 2-, and 3-digit 
codes using different choices of elemental sym- 
bols to define three such sets of codes. With 
two of the three sets, results were obtained 
indicating that this threshold is directly related 
to the amount of information contained in the 
most complex of the three positions. For the 
third set, the apparent difficulty of the specific 
binary choice chosen for the 3-digit code made 
this the most difficult of the three codes in the set. 
In a second experiment, thresholds for several 


binary choices used both as I-digit and as 3- 
digit displays were obtained. With five out of 
six such comparisons no significant differences 
could be detected for the 3- vs. the 1-digit binary 
choice. 

In the final two experiments, an alphabet of 
32 symbols was used in order to investigate the 
relationship of threshoid magnitude to amount 
of information, provided that only single-position 
displays were used. Increases in threshold were 
obtained in 8 vs. 16 and 16 vs. 32 categories 
when the comparison was based upon the dis- 
crimination of a set of alternatives first in 
isolation and finally as part of a larger set of 
alternatives. The data indicate that such 
threshold increases are not necessarily obtained 
when the larger set does not contain as members 
the specific symbols included in the smaller. 
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FACTOR ANALYSIS OF MEANING 


CHARLES E. OSGOOD AND GEORGE J. SUCI 


Institute of Communications Research, University of Illinois 


Aithough there are objective meth- 
ods for studying many aspects of 
language behavior, a survey of the 
literature (3) failed to uncover any 
standard ways of measuring meaning 
which meet the usual criteria of meas- 
urement. Perhaps because “‘mean- 
ings’ are generally assumed to be 
uniquely and infinitely variable, or 
perhaps because of the philosophical 
haziness of this concept, there have 
been few attempts to devise methods 
here. Nevertheless, whether looked 
at from the viewpoints of philosophy 
or linguistics, from political or soci- 
ological theory, or—interestingly 
enough—from within the core of 
psychological theory itself, the nature 
of meaning and change in meaning 
are central issues. 

This paper is one of a series de- 
scribing research on the development 
of an objective method of measuring 
meaning. A previous report gave a 
review of this problem, describing 
attempts to establish physiological, 
learning, and associational indices, 
and also summarized several theo- 
retical conceptions of the sign process. 
Subsequent papers will be concerned 
with evaluations of the validity, re- 
liability, sensitivity, and generality 
of the method proposed here, and will 
describe some applications of the 
method. The present paper presents 
the results of two independent factor 
analyses of semantic judgments. The 
first used Thurstone’s centroid 
method (5) and the second a method 
of analysis recently developed by the 
second author. Both of these analy- 
ses, based on independent samples of 
subjects and different procedures of 
judgment, yield similar factor struc- 


tures, indicating some degree of 
stability in the semantic factors un- 
covered so far. 


Locic OF THE SEMANTIC 
DIFFERENTIAL 


The purpose of our factor analytic 
work is to devise a scaling instrument 
which gives representation to the 
major dimensions along which mean- 
ingful reactions or judgments vary. 
In the course of several applications 
of preliminary forms of this measuring 
instrument, it has acquired the label, 
semantic differential. Since this label 
points quite accurately to the intended 
operation—a multivariate differentia- 
tion of concept meanings in terms of a 
limited number of semantic scales of 
known factor composition—we shall 
continue this usage. This term is not 
to be confused with the general 
semanticist’s structural differential 
which involves logical operations of 
a very different order. 

The semantic differential had its 
origin in research on synesthesia (2). 
In these studies it was found that the 
process of translating from musical 
stimulus to “visual” response, for 
example, could be described as the 
parallel alignment in thinking of two 
or more dimensions of experience, each 
defined in terms of polar opposites 
(high-low, hot-cold, loud-soft, light- 
dark, etc.), with translations occurring 
between equivalent portions of these 
related continua. It was shown that 
this process is not limited to rare 
synesthetic individuals, rather being 
quite general and consistent in the 
population and congruent with stand- 
ard systems of metaphor in the cul- 
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ture. Subsequent studies on the 
changing meaning of social stereo- 
types during the involvement of this 
country in World War II (4) made 
the notion of scaled continua between 
these polar terms explicit in a graphic 
method of collecting data and also 
was our first attempt to measure the 
meaning of concepts against a system 
of semantic dimensions. This back- 
ground has been described in more 
detail in the earlier paper in this 
series (3). 

Deriving from this earlier work, the 
logical basis of the semantic differ- 
ential is as follows: 

1. The process of description or 
judgment can be conceived as the allo- 
cation of a concept to an experiential 
continuum, definable by a pair of 
polar terms. This process reveals 
itself in the behavior of synesthetic 
subjects and in ordinary language 
metaphor, as well as in the more 
refined judgments elicited in psycho- 
physical experiments. The content 
of many complex linguistic assertions 
(e.g., “I don’t think these Chinese 
Communists are to be trusted”) can 
be reduced to the allocation of a 
concept to a scale, e.g., 

CHINESE COMMUNISTS: 


trustworthy : : : : :X: untrustworthy. 





The greater the intensity of particular 
assertions (e.g., “These Chinese Com- 
munists are completely untrust- 
worthy’’), the more extreme becomes 
the allocation toward one or the other 
of the polar terms. This relation of 
graphic extremeness to intensity or 
strength of association will be de- 
tailed in a subsequent report. The 
process of judgment here has much 
in common with the single-stimulus 
or absolute-judgment method in 
psychophysics. Subjects use the dif- 
ferential in ways suggesting that they 
“carry about” stabilizing frames of 
reference based upon a lifétime of 
making such judgments, i.e., each 


“absolute” judgment of a particular 
concept on a particular scale is really 
a comparative judgment against a 
multitude of previous concept-scale 
allocations. 

2. Many different experiential con- 
tinua, or ways in which meanings can 
vary, are essentially equivalent and 
hence may be represented by a single 
dimension. In the example given 
above, the specific scale trustworthy— 
untrustworthy would presumably ap- 
pear as an essentially evaluative 
judgment—the same speaker might 
well have said, ““Chinese Communists 
are no good.” This functional equiva- 
lence of many alternate modes of 
semantic judgment was clearly evi- 
dent in both the studies on synesthesia 
and those on the changing structure 
of social stereotypes. In the latter 
case, for instance, six of eight scales 
used on one form intercorrelated .90 
or better (fair—unfair, high-low, kind- 
cruel, valuable—worthless, Christian— 
anti-Christian, and honest—dishonest), 
clearly indicating the existence of a 
generalized evaluative factor. It is 
this characteristic of language and 
thinking that makes the development 
of a quantitative measuring instru- 
ment feasible. 

3. A limited number of such continua 
can be used to define a semantic space 
within which the meaning of any concept 
can be specified. This statement spec- 
ifies some variant of factor analysis 
as the basic methodology. If it can 
be demonstrated that some limited 
number of dimensions or factors is 
sufficient to differentiate among the 
meanings of randomly selected con- 
cepts, and if the scale system finally 
selected satisfies the usual criteria of 
measurement, then the data obtained 
with such a semantic differential 
become an operationally defined index 
of meaning. In the present instance, 
the operations can be made explicit 
and thereby repeatable (see under 
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“Procedure” below). It is, of course, 
true that one cannot define funda- 
mental factors in any final form on 
the basis of a single factor-analytic 
study; it is necessary to demonstrate 
repeatedly, over independent samples 
of subjects, concepts, and descriptive 
continua, that essentially the same 
sets of factors appear and in approxi- 
mately the same relations. In this 
paper we report the results of two 
such stud:es. ; 
Anatysis I: Centroip MeEtuop, 
Grapuic Data 


Since the purpose of our factor 
analysis of meaning is to discover the 
“natural” dimensionality of the se- 
mantic space, i.e., the system of 
factors which together account for 
variance in semantic judgments, it is 
important to obtain as representative 
a sampling of scales, concepts, and 
subjects as possible. The nature and 
number of factors obtained in any 
analysis is limited by the sources of 
variability in the original data, and 
we wished to avoid both the pro- 
duction of artificial factors and the 
omission of significant ones through 
biased sampling. 


Sampling.—In obtaining a sample of scales of 
semantic judgment, it was decided to use a 
frequency-of-usage or availability criterion. 
Forty nouns were taken from the Kent-Rosanoff 
list of stimulus words for free association and 
these were read in fairly rapid succession to a 
group of approximately 200 undergradvate 
students. These subjects were instructed to 
write down after each stimulus noun the first 
descriptive adjective that occurred to them (e.g., 
TREE—green, HOUSE—big, PRIEST—good). These 
subjects were asked not to search for exotic 
qualifiers, but simply to give whatever occurred 
to them immediately, and the rapid rate of pre- 
sentation further restricted the likelihood of 
getting rare associates. These data were then 
analyzed for frequency of occurrence of all 
adjectives, regardless of the stimulus words with 
which they had appeared. As might be ex- 
pected, the adjectives good and bad occurred with 
frequencies more than double those of any other 
adjectives. Perhaps less expected was the fact 
that nearly half of the 50 most frequently ap- 
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pearing adjectives were also clearly evaluative 
in nature. Also among the frequently given 
adjectives were most of the common sensory 
discriminanda, however, such as heavy-light, 
sweet-sour, and hot-cold. ‘These frequently used 
adjectives were made into sets of polar opposites 
and served as the sample of descriptive scales 
used in this study. For theoretical reasons, a 
few additional sensory continua were inserted in 
this set of 50; these scales were pungent-bland, 
fragrant-foul, and bright-dark. The kind of bias 
that this method of sampling has probably intro- 
duced will be considered later under “Discus- 
sion.” The entire set of scales is given in 
Table 1. 

The sampling of concepts presented a less 
critical problem, since our purpose was a factor 
analysis of scales of judgment rather than of 
concepts. It was important, however, that these 
concepts be others than those on which the ad- 
jective sample had been based (the 40 original 
stimulus words from the Kent-Rosanoff lists), 
that they be as diversified in meaning as possible 
so as to augment the total variability in judg- 
ments, and that they be familiar to the subjects 
we intended to use. On these bases the experi- 
menters simply selected the following 20 con- 
cepts: LADY, BOULDER, SIN, FATHER, LAKE, 
SYMPHONY, RUSSIAN, FEATHER, ME, FIRE, BABY, 
FRAUD, GOD, PATRIOT, TORNADO, SWORD, MOTHER, 
STATUE, COP, AMERICA. 

Ideally, the sample of subjects (Ss) for this 
type of analysis would be a representative cross 
section of the general population. As is so often 
the case, however, the availability and test 
sophistication of the college student population 
dictated our choice. A group of 100 students in 
introductory psychology served as Ss; they were 
well paid for their work, and internal evidence 
testifies to the care with which they did what was 
a long and not very exciting task. 

Procedure—The pairing of 50 descriptive 
scales with 20 concepts in all possible combi- 
nations generates a 1000-item test form. For 
checking reliability, 40 of these 1000 items, 
chosen at random but with the restriction that 
no concept should be used more than twice and 
no scale more than once, were repeated as a final 
page of the mimeographed test booklet. The 
ordering of concept-scale pairings was deliber- 
ately rotated rather than random; it was felt 
that this procedure would better guarantee 
independence of judgments, since the maximum 
of 19 items would intervene between successive 
judgments of the same concept and the maximum 
of 49 items would intervene between successive 
judgments on the same scale. Each item ap- 
peared as follows: 


LADY: 


rough smooth 
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with S instructed to place a check mark in that 
position indicating both the direction and in- 
tensity of his association. The method is thus 
a combined scaling and controlled association 
procedure. The exact instructions were as 
follows : 

“The purpose of this study is to measure the 
meanings of certain words to various people by 
having them judge each word against a series of 
descriptive scales. In taking this test, please 
judge the words on the basis of what they mean 
to you. Each numbered item presents a CONCEPT 
(such as picTaTor), and a scale (such as high- 
low). You are to rate the concept on the 7-point 
scale indicated. 

If you felt that the concept was very closely 
associated with one end of the scale, you might 
place your check mark as follows: 


DICTATOR: 


up : : : Le. : X down. 





“If you felt that the concept was quite closely 
related to one side of the scale, you might check 
as follows: 


HOUSE: 


straight :X: : _ crooked. 





“If the concept seemed only slightly related to 
one side as opposed to the other, you might check 
as follows: 


CLOUD: 


easy : :X difficult. 





“If you considered the scale completely ir- 
relevant, or both sides equally associated, you 
would check the middle space on the scale: 
TREE: 


idealistic : : ee 


realistic. 





Sometimes you may feel as though you have 
had the same item before on the test. This will 
not be the case; every item is different from every 
other item. So do not look back and forth 
throughout the test. Also, do not try to remember 
how you marked similar items earlier in the test. 
Make each item a separate and independent judg- 
ment. Work at fairly high speed, without wor- 
rying or puzzling over the individual items for 
long periods. It is your first impressions that 
we want. 

“Of course, some of the items will seem highly 
irrelevant to you. It was necessary, in the 
design of this test, to match every concept with 
every scale at some place, and this is why some 
items seem irrelevant—so give the best judgment 
you can and move along. 

“Do not try to complete the whole form in 
one sitting. As soon as you begin to feel a little 


fatigued—as soon as the meanings of the con- 
cepts begin to get a little ‘fuzzy’ in your mind— 
put this test aside for a while and do something 
else.” 


Treatment of data——The combina- 
tion of scales, concepts, and Ss used 
in this study generates a 50 X 20 X 
100 cube of data, as shown by the 
model in Fig. 1. Each scale position 
was assigned a number, from | to 7, 
arbitrarily from left to right, and 
hence each cell in this cube contains 
a number representing the judgment 
of a particular concept, on a par- 
ticular scale, by a particular subject. 
These data were punched into IBM 
cards in the order in which they ap- 
peared in the test forms. The first 
step in treatment of the data was to 
re-order them in such a way as to 
match the model, e.g., so that each 
subject would have a separate card 
for each of the 20 concepts judged, 
with the scales running in constant 
order from 1 through 50. These 
cards were then arranged into 20 
blocks, one for each concept, each 
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Fic. 1. Three-dimensional raw-score data 
matrix, obtained when a group of Ss (1,2, .. . 
s) judges a sample of concepts (A, B, . . . N) 
against a set of semantic scales (a, b, . . . n). 
Each cell contains a number from 1 to 7, repre- 
senting the judgment of a particular concept on 
a particular scale by a single subject. 














b= 








FACTOR ANALYSIS OF MEANING 329 


containing the complete ordered data 
for 100 subjects. 

Reliability check—The reliability 
with which semantic judgments are 
made with this 7-point graphic scale 
procedure was estimated by applying 
the retest method to the sample of 
40 duplicated items. It will be re- 
called that the last page of the test 
form was made up of a randomly 
selected set of items from the 1000- 
item test form proper. In this con- 
nection it is interesting to note that, 
when questioned at the conclusion of 
their work, not a single S realized 
that these were repeat items. This 
was because, as they had been warned, 
there would be many similar items to 
judge (e.g., item 87 might be LAKE 
tense-relaxed and item 592 might be 
LAKE calm—agitated), and therefore the 
actual repeats were not recognized as 
such. The coefficient of reliability 
was calculated by correlating pairs of 
scores, original and repeat check 
positions, for the same items. The 
summation of cross products and the 
summations for means and variances 
were taken across both Ss and items. 
Since there were 40 duplicated items 
per subject, and 100 Ss, 4000 pairs 
contributed to the reliability co- 
efficient. The resulting uncorrected 
coefficient was .85. This, it should 
be noted, is the reliability with which 
positions on these scales are checked, 
as estimated over a brief time period, 
not the reliability of any particular 
scale. Inspection of the data indi- 
cates that certain items may be more 
stable than others over the Ss as a 
group and that certain positions on 
the scale (particularly the extremes, 
1 or 7) are more stable than others. 
This is to be expected from the nature 
of the task, where concepts of am- 
biguous or indefinite meaning will 
tend to be allocated to positions near 
the neutral point (Position 4). Con- 
sidering the speed with which these 


responses are elicited from Ss—better 
than 10 items per minute on the 
average—this represents a reasonably 
high degree of stability. 

Matrix of intercorrelations.—Refer- 
ring back to Fig. 1, it can be seen that 
each S provides a complete set of 50 
judgments on each concept—each 
vertical column is such a set. Since 
both Ss and concepts are replicated, 
it would be possible to obtain separate 
matrices of scale intercorrelations for 
individual Ss (summing over con- 
cepts) as well as for individual con- 
cepts (summing over _ subjects). 
However, since our long-run purpose 
was to set up a semantic measuring 
instrument which would be applicable 
to people and concepts in general, we 
wished to obtain that matrix of inter- 
correlations among scales which would 
be most representati e or typical. 
We have therefore summed over both 
Ss and concepts, generating a single 
50 X 50 intercorreiational matrix of 
every scale with every other scale to 
which the total data contribute. 
Another reason for summing over 
concepts was to avoid spuriously low 
correlations resulting from low vari- 
ability of judgments on single con- 
cepts. If nearly all Ss call ToRNADO 
extremely cruel and also agree in 
calling it extremely unpleasant, the 
correlation between kind-cruel and 
pleasant—unpleasant would approach 
indeterminacy, despite the fact that 
over concepts in general there is a 
high positive correlation between 
these scales. 

Each of our 50 scales was responded 
to 2000 times, each of the 100 Ss re- 
sponding once to each of 20 concepts. 
Thus, every scale can be paired with 
every other scale 2000 times, each S 
contributing 20 pairs to the total, and 
each concept contributing 100 pairs. 
In computing each correlation, the 
summations for cross products, means, 
and variances were taken across both 
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Ss and concepts. In symbolic form, 
if X jj. is the score on the ith scale, 
for the jth concept, and the oth sub- 
ject, and X;.. as is the mean for the ith 
scale found by summing over concepts 
and Ss and dividing by 20 X 100, 
then the cross products between scales 
i and & in deviations from the means 
were found from: 


ps ® (X iz ¥, X;..) 
(Xeje — Xe.) (1) 


The expression for the variance on 
scale i is then: 


p» .¥ (X ise sat X;..)? 
anes N (2) 





These intercorrelations were calcu- 
lated with IBM equipment. To make 
possible a later analysis of factor 
structures for individual concepts, 
subtotals were printed for each block 
of 100 cards representing the total 
data for each concept. The matrix 
of correlations whose factorization is 
reported here was based on summation 
over all concepts. Therefore the vari- 
ance due to differences between con- 
cept means (the difference between 
X;;’s) is necessarily included in the 
correlation values. The possible effect 
of this on our results will be discussed 
in greater detail at a later point. 
Factor analysis ——Thurstone’s Cen- 
troid Factor Method (5) was applied 
to this matrix of correlations. Four 
factors were extracted and rotated 
into simple structure, maintaining 
orthogonality.! The rotated factor 
matrix appears as Table 1. Since 
orthogonal relations were maintained 


1 Due to space limitations, the original 50 x 
50 correlation matrix, the unrotated factor 
matrix, and the transformation matrix are not 
included in this paper. They may be obtained 
by writing to the authors. 

The writers wish to thank Mr. Kellogg Wilson 
for the work he did for us on these rotations and 
also those involved in the second analysis. 
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in rotation, the matrix in this table 
represents uncorrelated factors. We 
stopped extracting factors after the 
fourth; this factor accounted for less 
than 2% of the variance and ap- 
peared by inspection to be a residual— 
the pattern of scales having noticeable 
loadings on it (between .20 and .27) 
made no sense semantically. It is to 
be expected that a larger sampling of 
scales, with less emphasis on the 
evaluative factor, would allow a num- 
ber of additional factors of a denota- 
tive sort to appear. 

The problem of labeling factors is 
somewhat simpler here than in the 
usual case. In a sense, our polar 
scales label themselves as to content. 
The first factor is clearly identifiable 
as evaluative by listing the scales which 
have high loadings on it: good—bad, 
beautiful—ugly, sweet—sour, clean—dirty, 
tasty—distasteful, valuable—worthless, 
kind—cruel, pleasant—un pleasant, bitter- 
sweet, happy—sad, sacred—profane, nice— 
awful, fragrant—foul, honest—dishonest, 
and fair—unfair. All of these loadings 
are .75 or better, and it will also be 
noted by referring to Table 1 that 
these scales are “purely” evaluative 
in the sense that the extracted vari- 
ance is almost entirely in this first 
factor. Several other scales, rich— 
poor, clear-hazy, fresh-stale, and 
healthy-sick, while not as _ highly 
loaded as the first set on the evalua- 
tive factor, nevertheless restrict their 
loadings chiefly to this factor. 

The second factor identifies itself 
fairly well as a potency variable (or, 
as one of our undergraduate statistical 
assistants puts it, a “football player” 
factor): large-small, strong—weak, 


heavy-—light, and thick-thin serve to 
identify its general nature, these 
scales having the highest and most 
restricted loadings. The tendency 
for scales representing this factor to 
be contaminated, as it were, with the 
evaluative factor is apparent in Table 
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TABLE 1 


Rotatep Factor Loapincs—Ana ysis | 





























hard-soft, loud-soft, deep—shallow, 

















Loadings 
Adjective Pairs h? 
I Il III IV 
1. good—bad 88 05 — .09 .09 79 
2. large-small .06 62 34 04 51 
3. beautiful—ugly 86 09 01 26 82 
4. yellow-blue — .33 —.14 12 17 17 
5. hard-soft — .48 55 16 21 60 
6. sweet-sour 83 —.14 —.09 02 72 
7. strong—weak 19 62 20 — .03 46 
8. clean—dirty 82 , — .05 03 .02 68 
9. high-low 59° 21 08 04 40) 
10. calm—agitated 61 00 — .36 — .05 50 
11. tasty—distasteful 77 05 —.11 00 61 
12. valuable—worthless 79 .04 13 .00 64 
13. red-green — 33 —.08 35 .22 28 
14. young-old 31 — .30 32 01 29 
15. kind-cruel 82 —.10 —.18 .13 73 
16. loud-soft — .39 44 23 22 45 
17. deep-shallow 27 46 14 —.25 37 
18. pleasant—-unpleasant 82 — .05 28 —.12 77 
19. black-white — .64 31 01 — .03 51 
20. bitter—sweet — .80 ll 20 03 69 
21. happy-sad 76 —.11 00 .03 59 
22. sharp—dull .23 07 52 —.10 34 
23. empty-full —.57 — .26 — .03 18 43 
24. ferocious—peaceful — .69 17 41 02 67 
25. heavy-light — .36 62 —.11 06 53 
26. wet-—dry 08 .07 — .03 —.14 03 
27. sacred—profane 81 02 —.10 O1 67 
28. relaxed—tense 55 12 — 37 —.11 47 
29. brave—cowardly 66 44 12 03 64 
30. long-short .20 34 13 —.23 23 
31. rich—poor 60 .10 00 —.18 40) 
32. clear—hazy 59 03 10 —.16 38 
33. hot-—cold — .04 — .06 46 07 22 
34. thick-thin — .06 +4 — .06 —.11 21 
~ 35. nice-awful .87 — .08 19 15 82 
36. bright—dark 69 —.13 26 .00 56 
37. treble—bass 33 — .47 06 —.02 33 
38. angular—rounded —.17 08 43 12 23 
39. fragrant—foul 84 — .04 —.11 05 72 
40. honest—dishonest 85 07 — .02 16 75 
41. active-passive 14 04 59 — 02 37 
42. rough-smooth — .46 36 29 10 +4 
43. fresh—stale 68 01 22 —.11 52 
44. fast-slow 01 .00 70 —.12 50 
45. fair—unfair 83 08 —.07 ll 71 
46. rugged—delicate —.42 60 26 27 68 
47. near-far Al 13 11 — .05 20 
48. pungent—bland — 30 12 26 05 17 
49. healthy-sick 69 17 09 02 59 
50. wide—narrow .26 Al — .07 —.11 25 
% of total variance 33.78 7.62 6.24 1.52 4916 
% of common variance 68.55 15.46 12.66 3.08 9975 
1. The following scales are mainly brave-cowardly, bass-treble, rough- 
potency continua, but reflect con- smooth, rugged—delicate, and wide— 
siderable evaluative meaning as well: marrow. It also should be noted from 


inspection of this table that in general 
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loadings on the evaluative factor are 
higher than those on potency, even 
where “pure” scales are involved. 

The third factor appears to be 
mainly an activity variable in judg- 
ments, with some relation to physical 
sharpness or abruptness as well. The 
most distinctively loaded scales are 
fast-slow (.70), active—passive (.59), 
and hot—cold (.46) ; somewhat different 
in apparent meaning, but displaying 
similar factor loadings, are sharp—dull 
(.52) and angular—rounded (.43). The 
following scales have considerable 
loading on this activity factor, but 
also as much or more loading on 
evaluation: red—green, young—old (our 
Ss were college undergraduates), fe- 
rocious—peaceful, and _ tense—relaxed. 
The noticeable tendency for both 
activity and power to be associated 
with positive evaluation (e.g., good, 
strong, active tend to go together rather 
than good, weak, passive) is probably 
a cultural semantic bias. All we can 
say is that there appear to be inde- 
pendent factors operating, even though 
it is difficult to find many specific 
scales which are orthogonal with 
respect to evaluation. 

The percentages of total variance 
and common variance at the bottom 
of Table 1 confirm the dominant role 
of evaluation in semantic judgments 
and further indicate that the three 
factors we have isolated account for 
approximately 50% of the total vari- 
ance in judgments. Of the common 
factor variance, about 70% is evalua- 
tive. 


Anatysis II: Forcep—Cuoice 
Data? 


The method for obtaining cor- 
relations described in the previous 
section, by summing over both con- 
cepts and subjects, necessarily in- 

* The writers wish to express their appre- 


ciation of the work done by Mrs. Joan Dodge in 
collecting and analyzing the data for this study. 
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cludes the variance attributable to 
differences between concepts. Does 
this source of variance influence the 
factor results? To the extent that 
there are differences in factor struc- 
ture as between concepts, and to the 
extent that our sampling of concepts 
was nonrepresentative, the factor— 
analytic results could be biased. One 
way to get at the contribution of 
particular concepts is to check the 
degree of correlation between specific 
scales within each of the 20 concepts 
separately. When this was done by 
using the good—bad scale as a reference 
and obtaining the correlations of all 
other scales with this reference, the 
sizes of correlations were found to 
vary considerably with the concept 
involved. 

Rather than to reanalyze our entire 
data with statistical procedures which 
would eliminate concept variance, it 
was decided to do another analysis of 
the same scales with a new population 
of subjects, but to employ a method of 
collecting data which would itself 
eliminate specific concept differences. 
The method used involves a forced 
choice between pairs of descriptive 
scales as to the direction in which they 
should be aligned, with no concept 
being specified. This method had 
been used in an earlier study on 
synesthetic thinking (1) and referred 
to as a “parallel polarity” test. Ifa 
group of Ss is given the following item, 


SHARP—dull; relaxed—tense, 


and asked to encircle the one of the 
second pair which is closest in meaning 
to the capitalized word in the first 
pair, there is no restriction on the con- 
cept (if any) that may be used. Some 
Ss might be thinking of “people” con- 
cepts, others of “‘object” concepts, and 
yet others of “‘aesthetic” concepts, and 
so forth. Introspectively (as judged 
by comments of individuals taking 
such a test), there is often no par- 
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ticular concept involved. If 100% of 
the subjects select tense, as might 
happen above, this would indicate 
that sharp-tense vs. dull-relaxed is a 
generally appropriate parallelism, re- 
gardless of type of concept; if subjects 
divide randomly (e.g., 50% one way) 
on an item, for example, 


FRESH-Stale; long-short, 


it would appear that either the multi- 
tude of conceptual contexts in which 
these qualities might be related are 
random as to direction or that subjects 
differ randomly in their absolute 
judgments of relation. In any case, 
no particular concept or set of con- 
cepts is forcing the direction of 
relation. 


Procedure.—The pairing of each of 50 scales 
(the same as used in the first analysis) with every 
other scale generates a test comprising 1,225 
items. Again, a rotational procedure was fol- 
lowed to maximize the separation of identical 
scales. A total of 40 subjects was used, of the 
same type used in the previous analysis but not 
including any of the same individuals. The 
exact instructions were as follows: 

We want to find out what dimensions of 
meaning are related and what the basic factors 
in the system seem to be. This is a very im- 
portant problem for building any measuring 

_instrument and we ask your complete cooper- 
ation in carrying out the following instructions 


Procedure to Follow: 


a. Each item you see will be composed of two 
pairs of words. Your job is to encircle the 
word in the second pair which goes best with 
the capitalized word in the first pair. 


STRAIGHT—crooked noble—bestial 


b. Don’t look back over the judgments you have 
already completed. Judge each item by 
itself. 

c. Be sure to look at both words in each pair, so 
as to be judging the relation of the scales as 
wholes. 

d. Check back after you have made each judg- 
ment to be sure you answered the way you 
wanted to. Correct any judgment that you 
feel was not what you meant. 

e. Try not to base your judgments on your 
likes or dislikes of particular individual words. 
It is the relation among scales as wholes that 
you are judging. 
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General Suggestions: 


a. Some relations will be immediately obvious. 
With others it will be harder for you to make 
your decision. In some cases it might seem 
that both words could do equally well. Do 
not waste time worrying over any single item 
—we want your first impression. On the 
other hand, do a careful conscientious job. 
The results will be worthless if you do your 
work thoughtlessly. 

b. Do not work so long at one stretch that you 
become fatigued. Distribute your work on 
this test as you see fit. We would like to 
have it returned within one week. 

c. Be sure you do all the judging yourself. 
Forms filled out by more than one person 
would be worse than useless for our purpose. 

d. Return the form to this class no later than one 
week from today. 

e. If, for any reason, you feel that you cannot 
comply with the above instructions, please 
return the incomplete form to us. You may 
still keep your dollar, since no data at all is 
preferable to erroneous data. 


Treatment of data.—The measure of 
relation used in this analysis was 
simply the percentage of agreement 
between scales, i.e., the percentage of 
persons who associated one of the 
right-hand terms with the capitalized 
adjective. In the example above, the 
number of Ss who circled relaxed as 
going with sHARP was divided by 40 
(number of Ss) and the resulting 
percent was entered into a 50 X 50 
matrix of percentages of relations*® 
between descriptive terms. Since the 
number of persons circling one of the 
terms directly determines the number 
circling the other, calculations were 
necessary for only one term. The 
left-..and term was chosen since this 
corresponded to the original direction 
taken as positive in the first study. 
A perfect relation is inferred from 
100%; 50% indicates no relation, 
since equal numbers of Ss choose both 
terms; less than 50% indicates that 
the terms are negatively related in 


* The 50 X 50 matrix of percentages, the un- 
rotated dimension matrix, and the transfor- 
mation matrix are not included here because of 
space limitations. They may be obtained from 
the authors. 
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their given positions (e.g., as in the 
first illustration above). The re- 
sulting 50 X 50 matrix of percentages 
was factored by a technique described 
below and the results compared with 
those of the original centroid analysis. 

Factor analysis—Since the method 
of factoring applied to the matrix of 
percentages has only recently been 
developed and is as yet unpublished 
it is necessary to give the basic notion 
of the method here. The 50 variables 
are viewed as mutually orthogonal 
axes in k space, where k = 50. These 
axes intersect in an origin of zero 
where the origin corresponds to zero 
relation; in this study the origin cor- 
responds to 50%. The percentages of 
relation obtained for each variable are 
considered coordinates fixing the vari- 
able as a point in k& space. The 
higher the coordinate of a variable on 
any dimension, the higher the relation 
with that dimension. The aim is to 
find a minimum number of orthogonal 
dimensions which adequately describe 
the k-space. structure in terms of a set 
of coordinates for each variable on the 
dimensions. These new coordinates 
are taken as indicating the degrees of 
relation of the variable to the mini- 
mum number of dimensions. 

One method of deriving these co- 
ordinates is the following: If P, the 
matrix of percentages, is filled with 
1.00 in the main diagonal (one would 
expect each variable to be chosen as 
going with itself 100% of the time), 
Thurstone’s diagonal method (cf. 5, 
pp. 101—105) can be applied to a new 
matrix generated by PP’ and the 
resulting factor loadings will be the 
coordinates described above. 


A mathematically identical method, but one 
which represents a considerable saving in time 
over the above process, was used in this study.‘ 
This technique is briefly described below. It 
can easily be shown that if a dimension passes 


‘The operations involved in this technique 
are presently described in mimeographed form. 


through point 7 then the coordinate of variable 7 
on this dimension is given by 


Di? -— L? — L2 
WS Se > 
— 21; 
where ¢;j = the coordinate of i on a dimension 
passing through 7, D;; = the Euclidean distance 
between i and 7 in & space, L; and L; = the 
vector lengths from the origin to points 7 and 1, 
respectively. For subsequent dimensions D and 
L are reduced to their values in the reduced 
space, and the reduced values are applied in the 
above formula. Each new dimension is selected 
to pass through one of the variables. This 
formula is applied repeatedly until some criterion 
for stopping is reached or until the vector lengths 
are reduced to zero. Unlike factor loadings, the 
coordinates may have absolute values greater 
than 1.00. 

After the fifth dimension had been extracted, 
it became clear that only dimensions with single 
high coordinates (“specifics”) would continue 
toemerge. Analysis was therefore discontinued. 
These dimensions were then rotated graphically 
in an attempt to maximize the similarity between 
this structure and that obtained with the 
centroid method. The rotated dimension 
matrix appears as Table 2. 


CoMPARISON OF ANALYSES 
I anp II 


The purpose of the rotation in the 
second analysis was to determine to 
what extent the factors isolated in the 
original centroid analysis, using con- 
cept-scale pairings, could also be 
demonstrated in the dimensional anal- 
ysis, based on forced-choice judgments 
among scales themselves. We shall 
refer to “loadings” of variables on 
“factors” in speaking of results of the 
centroid method and to “coordinates” 
of variables on “dimensions” in speak- 
ing of results of the second method. 
Similarity between the results of the 

5 We thank Dr. C. F. Wrigley for bringing this 


measure to our attention. The measure, ¢;;, is 
found from 


2 fei* kj 
k 


69° Sooo 
VD fei? Z ge? 
k k 


where fii, gej represent the loadings for the &th 
variable on the ith and jth factors. References 
to the use of this index are found in (1) and (6). 
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TABLE 2 
Rotatep Dimension Coorpinates—Ana ysis II 
Coordinates 
Adjective Pairs — -—— 

I Il Ill IV Vv 

1. good-bad 2.29 84 07 wR ar 
2. large—small 12 1.76 — .02 1.00 | —.34 
3. beautiful-ugly 2.40 Al 38 1.48 } —.01 
4. yellow—blue —.31 —.27 —.15 73 | —.44 
5. hard-soft — 1.39 1.06 68 45 39 
6. sweet-sour 2.29 71 14 98 — .26 
7. strong—weak 38 1.81 67 1.36 — .53 
8. clean-dirty 2.38: 46 60 1.26 — .06 
9. high-low 1.35 1.21 1.00 1.00 | —.26 
10. calm—agitated 2.25 36 —.62 48 | —.14 
11. tasty—distasteful 2.11 1.05 21 1.21 — 33 
12. valuable—worthless 1.87 1.12 25 1.53 | —.46 
13. red-green —.59 1.03 78 s | —.19 
14. young-old 1.22 83 1.26 87 — .33 
15. kind—cruel 2.40 49 —.18 1.23 —.23 
16. loud-soft —1.71 1.03 61 69 .06 
17. deep-shallow 30 1.46 — .65 72 97 
18. pleasant—unpleasant 2.38 56 .24 1.38 | —.29 
19. black-white —2.11 18 — .64 — .53 13 
20. bitter—sweet —2.22 — 30 16 — .82 43 
21. happy-sad 2.09 97 61 1.50 | —2 

22. sharp—dull Jl 1.31 1.88 | 53 0 
23. empty-full —.62 —1.22 —.05 | —.72 1.47 
24. ferocious—peaceful —2.25 .25 44 16 —.09 

25. heavy-—light — 1.60 1.68 —.92 06 0 
26. wet-dry —.62 35 -46 | 00 ~ 34 
27. sacred—profane 2.29 58 —.25 1.04 —.24 
28. relaxed-tense 2.17 24 -63 | 62 —.30 
29. brave—cowardly 1.45 1.56 40 1.66 —.50 
30. long-short 59 1.01 02 | 72 — 38 
31. rich—poor 1.31 1.33 .22 1.19 — .36 
32. clear—hazy 1.92 69 8 93 —.09 
33. hot-cold 42 83 65 57 — 50 
34. thick—thin — 35 1.48 — .37 60 —.61 
* 35. nice—awful 2.39 1.07 — .02 1.15 — .07 
36. bright—dark 1.71 .78 1.32 | 1.07 —.21 
37. treble—bass 1.15 —.18 1.42 06 —.01 
38. angular—rounded —1.31 30 77 | —.08 42 
39. fragrant—foul 2.32 62 23 1.12 | — 31 
40. honest—dishonest 1.99 89 10 1.50 — 37 
41. active-passive 30 1.64 1.39 79 | —.40 
42. rough-smooth —2.32 .28 17 —.07 31 
43. fresh-stale 2.05 82 68 1.27 — 32 
44. fast-slow 42 1.10 1.50 63 | —02 
45. fair—unfair 2.22 89 37 1.33 —.29 
46. rugged—delicate —2Al 60 05 1.10 0 
47. near-far 85 1.09 467 74 —.17 
48. pungent-bland —1.41 66 48 06 — 39 
49. healthy-sick 1.79 1.38 63 1.81 — .54 
50. wide—narrow 60 1.24 —.14 99 — 60 














two methods were gauged in three 
ways: (a) qualitatively, by the extent 
to which variables heavily loaded on 
factors have high coordinates on 
dimensions, (b) by the magnitude of 
correlation between factor loadings 


and dimension coordinates across var- 
iables, (c) by the magnitude of indices 
of factorial similarity, ¢.6 “Heavily 
loaded” and “high coordinates” were 
defined by arbitrarily selected cri- 
terion values: the criteria for “heavily 
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loaded” were that variables have 
loadings > .80, > .50, and > .50 for 
Factors I, II, and III, respectively; 
the criteria for “high coordinates” 
were that variables have coordinates 
> 2.25, > 1.30, and > 1.30 for Di- 
mensions I, II, and III, respectively. 

Table 3 presents a comparison 


TABLE 3 


RELATIONS BETWEEN Factors (Mertuop I) 
anp Dimensions (Metuop II) 








Factor I Dimension I 
Criterion, .80 Criterion, 2.25 
(r =.944, ¢ =.952) 


Adjective Pairs 




















Both 
good—bad 88 2.29 
nice—awful 87 2.39 
beautiful—ugly 86 2.40 
fragrant-—foul 84 2.32 
sweet—sour 83 2.29 
clean-dirty 82 2.38 
pleasant—unpleasant 82 2.38 
sacred—profane 81 2.29 
Factor Only 
honest—dishonest 85 1.99 
fair—unfair 83 2.22 
Dimension Only 
rugged-delicate —.42 —2.41 
rough-smooth —.46 —2.32 
Factor II Dimension II 
Criterion, .50 Criterion, 1.30 
(ry =.421, ¢ =.622) 
Both 
strong—weak 62 1.81 
large—small 62 1.76 
heavy-light 62 1.68 
Factor Only 
rugged-—delicate 60 60 
hard-soft 55 1.06 
Dimension Only 
active—passive .04 1.64 
brave—cowardly 44 1.56 
thick—thin 44 1.48 
deep-—shallow 46 1.46 
healthy-sick 17 1.38 
Factor III Dimension III 


Criterion, .50 Criterion, 1.30 
(r = .639, ¢ =.722) 











Both 
fast-slow .70 1.5 
active—passive 59 1.39 
sharp—dull .52 1.88 

Factor Only 
(none) 
Dimension Only 

treble—bass .06 1.42 
bright—dark .26 , 1.32 





CHARLES E. OSGOOD AND GEORGE J. SUCI 


between factors and dimensions. The 
descriptive adjective pairs are placed 
in one of the following categories: 
variables with both heavy loadings 
and high coordinates, variables with 
heavy loadings but low coordinates, 
and variables with light loadings but 
high coordinates. 

The values r and ¢ between factors 
and dimensions are given at the top of 
each column. 

I. Evaluation. The high similarity 
between Dimension I and Factor I is 
apparent from both ¢ (= .952) and 
r (= .944) and the agreement be- 
tween variables considered high on 
both. Even the variables that only 
meet the criterion on one method are 
actually close to the criterion on the 
other. This again testifies to the 
prominence and stability of the evalu- 
ative component in semantic judg- 
ment. That Dimension I is evaluative 
is obvious in a catalogue of the high- 
coordinate variables—beautiful—ugly, 
nice—awful, clean-dirty, pleasant—un- 
pleasant, and so on. The evaluative 
“dimension” also draws in delicate— 
rugged and smooth-rough, which are 
not quite as prominent in the first 
analysis. 

II. Potency. The potency variable 
displays the lowest correspondence 
between factors and dimensions, but 
even here the evidence is satisfactory. 
The correlation over all 50 variables 
is .421 with an ¢ of .622. The three 
most heavily loaded variables on 
Factor II are also the three variables 
having the highest coordinates on 
Dimension II, strong—weak, large- 
small, and heavy-light. Of the two 
variables meeting the factor loading 
criterion only, hard-soft does have a 
sizable coordinate on Dimension II, 
but rugged—delicate clearly has become 
an evaluative variable in the forced- 
choice method. Of the variables 
meeting the high coordinate criterion 
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only, three also have sizable loadings 
on Factor II (brave—cowardly, thick— 
thin, and deep-shallow). Healthy—sick 
has nearly as high a coordinate on the 
evaluative dimension (1.79), where it 
belongs according to the first analysis, 
and active—passive has nearly as high 
a coordinate on the activity dimension 
(1.39), where it belongs according to 
the first analysis. 

III. Activity. Dimension III and 
Factor III correlate .639 with an ¢ of 


.722. It is also clearly interpretable 
as an activity factor from both 
loadings and coordinates. The three 


most highly loaded variables, sharp— 
dull, active—passive, and fast—slow, are 
also among the four variables having 
the highest coordinates on Dimension 
III. There are no variables meeting 
the factor-loading criterion that do 
not also meet the coordinate criterion. 
Of the two variables meeting the co- 
ordinate criterion only, bright—dark is 
actually higher on the evaluative 
dimension, as it is also on the evalu- 
ative factor in Analysis I. Treble— 
bass does not correspond to the results 
of the first analysis, but its loading 
on the activity dimension does cor- 
respond to the findings of earlier 
studies on synesthesia. 


DiscussIoNn 


The two factor analytic studies re- 
ported in this paper yield highly similar 
structures among the relations of 50 
bipolar descriptive scales. The first 
factor to appear in both studies is clearly 
evaluative in nature and accounts for 
more than half of the extractable vari- 
ance. The second and third factors to 
appear in both studies seem to represent 
potency and activity factors in semantic 
judgments, respectively, and again there 
is considerable correspondence between 
the two analyses. Since entirely dif- 
ferent subjects and entirely different 
methods of collecting data (concepts 
rated on scales in the first analysis and 
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forced choice among scales themselves 
in the second analysis) were employed 
in the two analyses, this over-all cor- 
respondence of the first three factors to 
emerge in both studies increases our 
confidence that we are isolating some- 
thing basic to the structuring of human 
judgments. What is perhaps remark- 
able is that such a large portion of the 
total variance in human judgment or 
meaning can be accounted for in terms 
of such a small number of basic variables. 
In the first study, for example, almost 
50% of the total variance of judgments 
of 20 varied concepts against 50 varied 
scales by 100 Ss is accountable for in 
terms of these three factors—and these 
were college student Ss. 

This is not taken to imply that these 
three, largely connotative factors repre- 
sent an exhaustive description of the 
meaning space. There is evidence in our 
data for a large number of “specific” 
factors, quite possibly denotative in 
nature and representative of the ways 
in which our sensory nervous systems are 
capable of differentiating input signals 
(e.g.,  hot-cold, black-white, wet-dry, 
treble-bass, and so on). When used 
connotatively, such descriptive scales tend 
to rotate into one of the first three factors 
(e.g., hot-cold is activity connotatively, 
white-black is evaluation connotatively, 
and so forth), but when used denotatively 
in judging sensorily relevant concepts 
such scales represent independent factors 
(e.g., when ice cream and baked potatoes 
are compared on hot-cold and objects 
varying in brightness are judged on 
white-black). One of the reasons for 
the failure of our factor analytic work to 
date to bring out such denotative factors 
in sufficient magnitude to be isolated is 
the method of sampling scales employed 
—a frequency-of-usage method which 
overemphasized the readily available 
evaluative alternates. In research now 
being planned we intend to use Roget’s 
Thesaurus as a source of scales, in other 
words, a logically exhaustive coverage 
rather than an availability sampling. 
Having already isolated three dominant 
connotative dimensions, these factors can 
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be given merely token representation in 
subsequent factor analytic work. 

Finally it should be noted that there 
is a tendency in both analyses for what 
might be called a convergence of scales 
toward a single, composite good-strong- 
active vs. bad-weak-passive factor. In 
other words, although we have evidence 
in both analyses for three independent 
connotative factors—evaluation, po- 
tency, and activity—specific scales repre- 
senting the second two factors tend to be 
contaminated with a pervasive evalu- 
ativeness. Scales like wide—narrow, brave 
—cowardly, rough—smooth and healthy—sick 
(Factor II) and like young-—old, ferocious— 
peaceful, tense-relaxed, and bright-dark 
(Factor III) also have as high or higher 
loadings or coordinates on evaluation. 
This is quite probably a characteristic of 
our culture—and possibly all human 
cultures—that both potency and activity 
(rather than weakness and passivity) are 
positive values. This means that it is 
difficult to discover specific connotative 
scales to represent purely our second and 
third factors even though these factors 
as such are demonstrable. The shift of 
such scales as those above from denota- 
tive to connotative usage is probably one 
of the reasons for differences between 
Factor Analyses I and II. When the 
concepts BOULDER and FEATHER, for 
example, are judged against rugged- 
delicate and later against evaluative 
scales, the former is clearly rugged and 
the latter clearly delicate denotatively, 
but both are judged near “4” on evalu- 
ation, i.e., irrelevant, and the correlation 
between this scale and evaluation is zero. 
But when no specific concepts are used, 
as in our second method, the pervasive 
evaluative connotation dominates the 
scale-pairing, pleasant-unpleasant vs. 
delicate-rugged, for example, and most 
Ss encircle delicate as going with pleasant, 
e.g., we speak connotatively as having 
had a rugged (hard, unpleasant) time. 


SUMMARY 


Two factor analytic studies of meaningful 
judgments are reported in this paper, both based 
upon the same sample of 50 bipolar descriptive 
scales. The first analysis applied Thurstone’s 
centroid method to correlations derived from 
7-step graphic scale data obtained by having 
100 Ss judge 20 specific concepts against the 50 
scales. The second analysis applied a new 
method developed by the second author to a 
matrix of percentages of agreement obtained by 
having 40 different Ss make forced-choice 
pairings of the polar terms themselves, i.e., 
without any specific concepts being judged. 
The first three factors to appear in both analyses 
show considerable correspondence, both in order 
of appearance and magnitude and in the par- 
ticular scales which define them. The evidence 
as a whole points to the existence in meaningful 
judgments of three major connotative factors: 
evaluation, potency, and activity. The evalu- 
ative factor accounts for by far the largest 
portion of the extracted variance. These three 
factors are taken as independent dimensions of 
the semantic space within which the meanings 
of concepts may be specified. 
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Since the completion of the experi- 
ment reported herein, Lawson (2) has 
published data from a study involving 
a high reward and a low reward group 
indicating that the strength’of sec- 
ondary reward as inferred from ac- 
quisition and extinction measures does 
not vary with the amount of primary 
reward. The experiment being re- 
ported here utilized five primary rein- 
forcement groups receiving amounts 
of food ranging from .05 to 2.40 gm., 
and one group receiving .20 gm. im- 
pregnated with saccharine. The Ss 
were reinforced on a black-white dis- 
crimination for entering the white box. 
They then were trained on a single- 
unit T maze to run to the white box 
in the absence of any food. Differ- 
ences in the performance of various 
groups on the T maze would be as- 
sumed to be a function of differences 
in the effectiveness of the white box 
as a secondary reinforcer. Such dif- 
ferences in effectiveness as might be 
indicated would result from the dif- 
ferent amounts of food presented in 
the white box during discrimination 
training. 

METHOD 


Subjects.—The Ss were 102 naive male and 
female rats of the Sprague-Dawley strain, 60 to 


1This paper constitutes a portion of a dis- 
sertation submitted in partial fulfillment of the 
requirements for the degree of Doctor of Phi- 
losophy in the Graduate College of the Uni- 
versity of Illinois, 1952. The writer wishes to 
express his sincere gratitude to Dr. G. Robert 
Grice, who served as his advisor, for his valuable 
suggestions and criticisms; and to Dr. L. I. 
O’Kelly and Dr. O. H. Mowrer, who served as 
members of his committee. 

2 Now at Tulane University. 


130 days at the beginning of the experiment, and 
they were randomly assigned to six groups. 

Apparatus—The apparatus consisted of a 
black-white discrimination box and a single-unit 
T maze. The discrimination box was a modifi- 
cation of the type used by Grice (1). From the 
starting box, 4 in. high, 4 in. wide, and 15 in. 
long, S traversed a 2-in. wide, 12-in. long alley to 
a choice chamber, which was 6 in. long and 
tapered from 2 in. wide at its junction with the 
starting alley to 8.5 in. wide at its junction with 
the entrance to the two goal boxes. The goal 
boxes were 4 in. wide, 4 in. high, and 15 in. long. 
Curtains corresponding in- color to the goal boxes 
were hung inside 2 in. from the entrance. The 
entire apparatus was painted neutral gray with 
the exception of the goal boxes. The top was 
covered with hinged sections of hardware cloth. 
Vertical sliding doors were placed at the exit of 
the start box and at-the-entrance tothe goal 
boxes. 

The maze was a single-unit T maze with 
starting box, alleys, and goal boxes all 4 in. wide 
and 4in. high. The starting box was 14 in. long. 
The goal boxes had the same dimensions as the 
goal boxes on the discrimination apparatus. 
The stem of the T was 24 in. long and each arm 
was 13 in. long. The entire apparatus, with the 
exception of the white and black goal boxes, was 
painted gray. The goal boxes could be placed in 
position at either asm of the T.” Gray painted 
vertical sliding doors were placed at the exit of 
the starting box, at the entrance to each arm of 
the T, and at the entrance to each goal box. 

Training procedure.—All Ss were placed on a 
23-hr. food deprivation schedule one week prior 
to any training. They were fed 9 gm. of Purina 
Lab Chow in individual feeding cages during the 
1-hr. feeding period. The total daily ration was 
maintained at 9 gm. throughout the experiment. 
Water was always available both in individual 
feeding cages and living cages. 

On the sixth day of the seven-day adaptation 
period each S was given three trials in the T 
maze to determine individual right- and left- 
turning preferences. The white and black end 
boxes were removed from the arms of the maze, 
and no food reinforcement was given. 

During discrimination training Ss in each 
group received one of the following five amounts 
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TABLE 1 


Error, Time, AND TRIAL MEDIANS FOR 
THE DIscRIMINATION PROBLEM 
(N = 17 each group) 














30 Free Trials To Criterion 
Amount of 
J ooc 
Reinforcement cuit mg Trials i 

05 3 2.1 14 3.4 

.20 3 1.8 15 3.0 

60 2 2.1 13 3.6 

1.20 3 1.8 15 2.8 
2.40 4 2.2 17 2.9 
.20S 2 1.9 15 3.8 

















of food in the white goal box: .05, .20, .60, 1.20, 
2.40 gm. The animals in the sixth group re- 
ceived .20 gm. of food pellets with a .33% sac- 
charine concentration. The food for all groups 
was in the form of .05 gm. pellets. 

All Ss were reinforced for entering the white 
goal box, the position of which was shifted from 
right to left according to the following sequence: 
RLRRLLRLLRLRLLRRLRRL. The sequence 
was repeated for a total of 60 trials. Each S was 
given two trials perday. The first trial on each 
day was a free-choice trial’ and the second was 
always a forced trial to the other color box. 
Animals were retained in the white box until all 
the food pellets were eaten, and in the black box 
for an arbitrary 10-sec. period. 

During the first 10 training days (first 20 
trials) S was allowed to remain in the choice 
chamber until it entered the black box on a 
forced trial. After Trial 20, if S did not enter 
the black box on the forced trial within 30 sec. 
after leaving the starting box, it was removed 
from the choice chamber, and the response time 
recorded as 30 sec. 

On the day following Trial 60 on the discrimi- 
nation, all Ss were given 20 trials on the T maze. 
For each S the white box was placed at the end 
of the nonpreferred arm of the T. No food was 
presented at any time. A clean empty food dish 
of the type used in the discrimination box was 
placed in the white box. 

The S was released from the start box by 
raising the vertical sliding door. The doors at 
the entrances to the arms of the T were open but 
those at the entrances to the goal boxes were 
closed. When S started down one arm of the 
T, the door at the entrance was closed to prevent 
retracing, and the door of the goal box at the end 
of that arm opened. When S entered the goal 
box the door was closed and S was detained for 
3 sec. The trials were spaced by a variable 
interval ranging from 5 to 10 min. When S did 
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not enter the black box within 30 sec. after 
leaving the start box, it was removed from the 
maze and the trial scored as incorrect with a 
response time of 30 sec. 


RESULTS 


Table 1 presents the error, time, 
and trial medians for each reinforce- 
ment group on the discrimination 
problem. Data from forced trials 
were not included in the calculation 
of any of these medians. Although 
each group was run until it completed 
the total of 60 trials, an arbitrary 
acquisition criterion had been set at 
10 successive correct responses on the 
free trials. 

The Mann-Whitney U test indi- 
cated that none of the differences 
between the groups for any of the 
response measures was significant at 
the .05 level. 

The mean number of responses 
made to the white box of the T maze 
by each group is presented in Table 2. 
These means were calculated both for 
the first 15 trials and for the total of 
20 trials. The F ratio for the 15-trial 
means was 3.30 and for the 20-trial 
means was 1.68. Neither of these is 
significant at the .05 level (F = 4.40 
required for significance at the .05 
level).- 


TABLE 2 


Mean Correct Responses AND MEDIAN 
Times ror T-Maze Prosiem 
(N = 17 each group) 














M Cc t Median Time 
Amount of Responses. of this 
Food 
Rein- — 
Soapeeant 15 20 Correct | Incorrect 
Trials Trials R's R's 
05 10.7 14.0 4.8 11.1 
.20 10.2 13.6 5.6 10.7 
60 10.4 13.9 5.6 11.9 
1.20 10.5 13.9 5.6 11.2 
2.40 10.1 13.6 6.0 12.4 
.20S A 12.8 6.2 9.5 
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The median response times on the 
correct and incorrect trials are pre- 
sented separately in Table 2. Group 
medians for each of these classifi- 
cations were compared by means of 
the Mann-Whitney U test. None of 
the differences between groups was 
significant at the .05 level. 

Since the results of the analyses of 
variance of trial scores and the U test 
on time scores do not warrant the 
rejection of the hypothesis: of no 
difference among the groups, the data 
from all groups were combined for a 
further analysis of the effectiveness of 
the reinforcing properties of the white 
goal box. The percentage of white 
box responses on each trial for all six 
groups is presented in Fig. 1. This 
figure also presents the median run- 
ning time on correct responses for all 
groups combined, plotted as a func- 
tion of trials. 
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and per cent correct responses (upper curve) on 


T maze for all Ss (NV = 102). 


On the turning-preference test 
given prior to any training, exactly 
one-half of the 102 Ss showed a pref- 
erence for each arm of the T. During 
T-maze training, when the white box 
was placed on the originally non- 
preferred side for each S, 95 Ss ran to 
the white box on more than half of 
the trials. 


Discussion 


Despite the fact that the magnitudes 
of food reinforcement used in this experi- 
ment ranged from .56% to 26% of the 
total daily ration, there is no evidence 
that performance on the discrimination 
problem is differentially influenced by 
magnitude of reinforcement. These re- 
sults are in substantial agreement with 
those reported by Reynolds (3) for the 
same general type of problem. 

All T-maze measures of performance 
indicate that the white box was effica- 
cious in strengthening the response of 
turning toward it. The percentage of 
correct responses for Ss in all groups 
combined increased from 56.5% on the 
first two trials to 82.5% on the fifth 
trial. The decrease in percentage of 
correct responses on trials following the 
fifth might reasonably be attributed to 
the extinction of the conditioned rein- 
forcing effect of the white box stimuli. 
Analysis of the median response times 
for all Ss making the white box response 
presents the same general picture. 

There is no evidence that the effective- 
ness of the secondary reinforcement 
varied as a function of the amount of 
food reinforcement with which it had 
been paired. The performance of the 
group receiving saccharine-impregnated 
food did not differ significantly from that 
of the group receiving the same amount 
of plain food. 


SUMMARY 


Six groups of 17 albino rats were trained on a 
black-white discrimination with food reinforce- 
ment in the amounts of .05, .20, .60, 1.20, and 
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2.40 gm., and .20 gm. saccharine impregnated. 
Following discrimination training all Ss were 
given 20,trials on a single-unit T maze. A white 
box similar to the one in which Ss received food 
during discrimination training was placed at the 
end of the nonpreferred arm of the T for each 
S. No food reinforcement was given during T- 
maze training. 

Results from the T-maze training showed that 
the white box acquired secondary reinforcing 
properties, but the effectiveness of the secondary 
reinforcement did not vary as a function of the 
quantity or quality of the primary food rein- 
forcement with which it had been paired. 
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